diff mbox

[next-next-2.6] net: configurable device name hash

Message ID 200911112116.14103.opurdila@ixiacom.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Octavian Purdila Nov. 11, 2009, 7:16 p.m. UTC
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Nov. 11, 2009, 7:21 p.m. UTC | #1
From: Octavian Purdila <opurdila@ixiacom.com>
Date: Wed, 11 Nov 2009 21:16:14 +0200

> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>

We're not doing this sorry.

Dynamically size it at boot time or something, but a config
option is out of the question.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Octavian Purdila Nov. 11, 2009, 7:38 p.m. UTC | #2
On Wednesday 11 November 2009 21:21:20 you wrote:
> From: Octavian Purdila <opurdila@ixiacom.com>
> Date: Wed, 11 Nov 2009 21:16:14 +0200
> 
> > Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
> 
> We're not doing this sorry.
> 
> Dynamically size it at boot time or something, but a config
> option is out of the question.
> 

I don't think we can dynamically size it at boot time since it depends on the 
usage pattern which is impossible to determine at boot time, right?

Would it be acceptable to grow it at runtime, in list_netdevice for instance?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 11, 2009, 8:08 p.m. UTC | #3
Octavian Purdila a écrit :
> On Wednesday 11 November 2009 21:21:20 you wrote:
>> From: Octavian Purdila <opurdila@ixiacom.com>
>> Date: Wed, 11 Nov 2009 21:16:14 +0200
>>
>>> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
>> We're not doing this sorry.
>>
>> Dynamically size it at boot time or something, but a config
>> option is out of the question.
>>
> 
> I don't think we can dynamically size it at boot time since it depends on the 
> usage pattern which is impossible to determine at boot time, right?
> 
> Would it be acceptable to grow it at runtime, in list_netdevice for instance?

It will be really hard, now we use RCU lookups...

What workload could reasonably need 1.000.000 hash slots, and 16.000.000 netdevices ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Octavian Purdila Nov. 11, 2009, 8:32 p.m. UTC | #4
On Wednesday 11 November 2009 22:08:31 you wrote:
> Octavian Purdila a écrit :
> > On Wednesday 11 November 2009 21:21:20 you wrote:
> >> From: Octavian Purdila <opurdila@ixiacom.com>
> >> Date: Wed, 11 Nov 2009 21:16:14 +0200
> >>
> >>> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
> >>
> >> We're not doing this sorry.
> >>
> >> Dynamically size it at boot time or something, but a config
> >> option is out of the question.
> >
> > I don't think we can dynamically size it at boot time since it depends on
> > the usage pattern which is impossible to determine at boot time, right?
> >
> > Would it be acceptable to grow it at runtime, in list_netdevice for
> > instance?
> 
> It will be really hard, now we use RCU lookups...
> 

OK, I've forgot about that :) 

> What workload could reasonably need 1.000.000 hash slots, and 16.000.000
>  netdevices ?
> 

And yes, I clearly get ahead of myself with that 20 bits. 

Lets say we will max it to 14 for machines with over 1G of memory, would it be 
acceptable to consume 64K out of that even if in most of the usecases we will 
only have a handful of interfaces?

So, on second thought, perhaps is better to leave this alone and have those 
few users who need it to change NETDEV_HASHBITS themselves - its not like its 
a too heavy patch to carry around.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 11, 2009, 8:42 p.m. UTC | #5
From: Octavian Purdila <opurdila@ixiacom.com>
Date: Wed, 11 Nov 2009 21:38:44 +0200

> I don't think we can dynamically size it at boot time since it
> depends on the usage pattern which is impossible to determine at
> boot time, right?

We have no idea how many sockets will be used by the system yet we
dynamically size the socket hash tables.

Please do some research and see how we handle this elsewhere in the
networking.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Nov. 11, 2009, 9:33 p.m. UTC | #6
On Wed, 11 Nov 2009 12:42:35 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Octavian Purdila <opurdila@ixiacom.com>
> Date: Wed, 11 Nov 2009 21:38:44 +0200
> 
> > I don't think we can dynamically size it at boot time since it
> > depends on the usage pattern which is impossible to determine at
> > boot time, right?
> 
> We have no idea how many sockets will be used by the system yet we
> dynamically size the socket hash tables.
> 
> Please do some research and see how we handle this elsewhere in the
> networking.

dcache also sizes hash bits at boot time on available memory.
See alloc_large_system_hash().
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Octavian Purdila Nov. 11, 2009, 9:47 p.m. UTC | #7
On Wednesday 11 November 2009 23:33:42 you wrote:
> On Wed, 11 Nov 2009 12:42:35 -0800 (PST)
> 
> David Miller <davem@davemloft.net> wrote:
> > From: Octavian Purdila <opurdila@ixiacom.com>
> > Date: Wed, 11 Nov 2009 21:38:44 +0200
> >
> > > I don't think we can dynamically size it at boot time since it
> > > depends on the usage pattern which is impossible to determine at
> > > boot time, right?
> >
> > We have no idea how many sockets will be used by the system yet we
> > dynamically size the socket hash tables.
> >
> > Please do some research and see how we handle this elsewhere in the
> > networking.
> 
> dcache also sizes hash bits at boot time on available memory.
> See alloc_large_system_hash().
> 

Thanks Stephen.

I was actually taking a look at that but I see that the device hash is 
allocated per net namespace which means we can't use 
alloc_large_system_hash().

We could use a similar function that will work in the per namespace 
initialization context, but this might upset net namespace folks since we will 
get a large hash for every namespace.

Not sure what can be done to address that problem now except using a boot 
parameter to override the defaults. A better solution would be to be able to 
use "namespace create" parameters but it appears we don't have this 
possibility, yet.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Nov. 11, 2009, 10:24 p.m. UTC | #8
On Wed, 11 Nov 2009 23:47:41 +0200
Octavian Purdila <opurdila@ixiacom.com> wrote:

> On Wednesday 11 November 2009 23:33:42 you wrote:
> > On Wed, 11 Nov 2009 12:42:35 -0800 (PST)
> > 
> > David Miller <davem@davemloft.net> wrote:
> > > From: Octavian Purdila <opurdila@ixiacom.com>
> > > Date: Wed, 11 Nov 2009 21:38:44 +0200
> > >
> > > > I don't think we can dynamically size it at boot time since it
> > > > depends on the usage pattern which is impossible to determine at
> > > > boot time, right?
> > >
> > > We have no idea how many sockets will be used by the system yet we
> > > dynamically size the socket hash tables.
> > >
> > > Please do some research and see how we handle this elsewhere in the
> > > networking.
> > 
> > dcache also sizes hash bits at boot time on available memory.
> > See alloc_large_system_hash().
> > 
> 
> Thanks Stephen.
> 
> I was actually taking a look at that but I see that the device hash is 
> allocated per net namespace which means we can't use 
> alloc_large_system_hash().
> 
> We could use a similar function that will work in the per namespace 
> initialization context, but this might upset net namespace folks since we will 
> get a large hash for every namespace.
> 
> Not sure what can be done to address that problem now except using a boot 
> parameter to override the defaults. A better solution would be to be able to 
> use "namespace create" parameters but it appears we don't have this 
> possibility, yet.
> 

Remember though that really hash sizes really don't buy that much more speed.
Going from 256 to 1024 gives a 4x benefit but with 10,000 devices that
just means scanning 10 vs. 40 names. It is not like the file system cache
where name lookup is a major component of overhead.

You can still use alloc_large_system_hash, but just constrain it to a maximum
of order 10 or something.
David Miller Nov. 12, 2009, 2:36 a.m. UTC | #9
From: Octavian Purdila <opurdila@ixiacom.com>
Date: Wed, 11 Nov 2009 23:47:41 +0200

> We could use a similar function that will work in the per namespace
> initialization context, but this might upset net namespace folks
> since we will get a large hash for every namespace.

Use kzalloc(), that's sufficient for a 64K or so hash table which is
way more than you ever will need.

Use the GFP_* flags that will silently (ie. without a log message)
fail, and divide by two until you successfully allocate the table if
you're worried about memory fragmentation at allocation time.

This is so straightforward, I can't believe we're talking so much
about how to implement this, it's a 15 minute hack :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Smith Nov. 12, 2009, 12:46 p.m. UTC | #10
On Wed, 11 Nov 2009 18:36:26 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Octavian Purdila <opurdila@ixiacom.com>
> Date: Wed, 11 Nov 2009 23:47:41 +0200
> 
> > We could use a similar function that will work in the per namespace
> > initialization context, but this might upset net namespace folks
> > since we will get a large hash for every namespace.
> 
> Use kzalloc(), that's sufficient for a 64K or so hash table which is
> way more than you ever will need.
> 
> Use the GFP_* flags that will silently (ie. without a log message)
> fail, and divide by two until you successfully allocate the table if
> you're worried about memory fragmentation at allocation time.
> 
> This is so straightforward, I can't believe we're talking so much
> about how to implement this, it's a 15 minute hack :-)

Yes, but sadly, sometimes there is too much history(!) to be able to be
fully aware of it. "suck-it-and-see" type patches are possibly a
quicker way to find out what people are thinking right now!

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 12, 2009, 2:09 p.m. UTC | #11
Mark Smith a écrit :
> On Wed, 11 Nov 2009 18:36:26 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
> 
>> From: Octavian Purdila <opurdila@ixiacom.com>
>> Date: Wed, 11 Nov 2009 23:47:41 +0200
>>
>>> We could use a similar function that will work in the per namespace
>>> initialization context, but this might upset net namespace folks
>>> since we will get a large hash for every namespace.
>> Use kzalloc(), that's sufficient for a 64K or so hash table which is
>> way more than you ever will need.
>>
>> Use the GFP_* flags that will silently (ie. without a log message)
>> fail, and divide by two until you successfully allocate the table if
>> you're worried about memory fragmentation at allocation time.
>>
>> This is so straightforward, I can't believe we're talking so much
>> about how to implement this, it's a 15 minute hack :-)
> 
> Yes, but sadly, sometimes there is too much history(!) to be able to be
> fully aware of it. "suck-it-and-see" type patches are possibly a
> quicker way to find out what people are thinking right now!
> 

Before extending hash tables, we should make sure existing algos are going to
scale with millions of netdevices, and they dont scale that much for the moment.
We still have many for_each_netdev() loops...

It's easy to change a constant somewhere in an include file, its less easy to make
real scalability changes :(

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 0addd45..8a129d5 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -29,8 +29,7 @@  struct net_generic;
 struct sock;
 
 
-#define NETDEV_HASHBITS    8
-#define NETDEV_HASHENTRIES (1 << NETDEV_HASHBITS)
+#define NETDEV_HASHENTRIES (1 << CONFIG_NETDEV_HASHBITS)
 
 struct net {
 	atomic_t		count;		/* To decided when the network
diff --git a/net/Kconfig b/net/Kconfig
index 041c35e..f5db7b2 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -45,6 +45,13 @@  config COMPAT_NETLINK_MESSAGES
 
 menu "Networking options"
 
+config NETDEV_HASHBITS
+	int "Network device hash size"
+	range 8 20
+	default 8
+	help
+	  Select network device hash size as a power of 2.
+
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/xfrm/Kconfig"