Message ID | CAGsizz+2nH+Sx+7TjZwEeKBiHbio0BxoN7CgT4Sspp0EzRN6Ug@mail.gmail.com |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
Le mardi 17 janvier 2012 à 00:11 +0100, Štefan Gula a écrit : > I agree with you, but for the start of this feature I believe static > slots size is enough here - same limitation is inside the original > linux bridge code. I have merged hopefully all your comments and here > is the newest patch: 1) Before sending a new version of your patch, please fix your mailer, you can read Documentation/email-clients.txt for hints. Send it to you and check you can apply it. Then, once you are confident its OK, you can send it to netdev. Right now, it doesnt apply, because of unexpected line wraps. # cd next-next # cat /tmp/patch | patch -p1 patching file include/net/ipip.h patching file net/ipv4/Kconfig patching file net/ipv4/ip_gre.c patch: **** malformed patch at line 495: ipgre_tap_bridge_entry *entry) 2) You call ipgre_tap_bridge_fini() from ipgre_exit_net() and ipgre_init_net(), thats completely bogus if CONFIG_NET_NS=y Just remove the struct kmem_cache *ipgre_tap_bridge_cache and use instead kmalloc(sizeof(...))/kfree(ptr) instead. 3) ipgre_tap_bridge_init() should not be __net_init, but __init 4) I cant find why you store 'struct net_device *dev;' in a ipgre_tap_bridge_entry, it seems you never read it ? 5) Also please add an empty line between variable declaration and function body. Also, we prefer an alignement of parameters. For example : static __be32 ipgre_tap_bridge_get_raddr(struct ip_tunnel *tunnel, const unsigned char *addr) { __be32 raddr; struct ipgre_tap_bridge_entry *entry; rcu_read_lock(); entry = __ipgre_tap_bridge_get(tunnel, addr); if (entry == NULL) raddr = 0; else raddr = entry->raddr; rcu_read_unlock(); return raddr; } should be : static __be32 ipgre_tap_bridge_get_raddr(struct ip_tunnel *tunnel, const unsigned char *addr) { __be32 raddr = 0; struct ipgre_tap_bridge_entry *entry; rcu_read_lock(); entry = __ipgre_tap_bridge_get(tunnel, addr); if (entry) raddr = entry->raddr; rcu_read_unlock(); return raddr; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dňa 17. januára 2012 5:47, Eric Dumazet <eric.dumazet@gmail.com> napísal/a: > Le mardi 17 janvier 2012 à 00:11 +0100, Štefan Gula a écrit : > >> I agree with you, but for the start of this feature I believe static >> slots size is enough here - same limitation is inside the original >> linux bridge code. I have merged hopefully all your comments and here >> is the newest patch: > > > > 1) Before sending a new version of your patch, please fix your mailer, > you can read Documentation/email-clients.txt for hints. > Sorry, I have to test it as I am using clasical gmail web GUI to send emails > Send it to you and check you can apply it. > Then, once you are confident its OK, you can send it to netdev. > > Right now, it doesnt apply, because of unexpected line wraps. > > # cd next-next > # cat /tmp/patch | patch -p1 > patching file include/net/ipip.h > patching file net/ipv4/Kconfig > patching file net/ipv4/ip_gre.c > patch: **** malformed patch at line 495: ipgre_tap_bridge_entry *entry) > > > 2) You call ipgre_tap_bridge_fini() from ipgre_exit_net() and > ipgre_init_net(), thats completely bogus if CONFIG_NET_NS=y > > Just remove the struct kmem_cache *ipgre_tap_bridge_cache > and use instead kmalloc(sizeof(...))/kfree(ptr) instead. > As this is completely the same part of code from net/bridge/br_fdb.c, can you give me a hint about how change that as I believe it should be changed also there? > 3) ipgre_tap_bridge_init() should not be __net_init, but __init > Ok > > 4) I cant find why you store 'struct net_device *dev;' in a > ipgre_tap_bridge_entry, it seems you never read it ? > yes you are right, it is not needed - removed from code > 5) Also please add an empty line between variable declaration and > function body. Also, we prefer an alignement of parameters. > I used scripts/checkpatch.pl to check my coding styles, but apparently this is missing from it as it never complains before about this. Anyway hopefully done ok based your comments > For example : > > static __be32 ipgre_tap_bridge_get_raddr(struct ip_tunnel *tunnel, > const unsigned char *addr) > { > __be32 raddr; > struct ipgre_tap_bridge_entry *entry; > rcu_read_lock(); > entry = __ipgre_tap_bridge_get(tunnel, addr); > if (entry == NULL) > raddr = 0; > else > raddr = entry->raddr; > rcu_read_unlock(); > return raddr; > } > > should be : > > static __be32 ipgre_tap_bridge_get_raddr(struct ip_tunnel *tunnel, > const unsigned char *addr) > { > __be32 raddr = 0; > struct ipgre_tap_bridge_entry *entry; > > rcu_read_lock(); > entry = __ipgre_tap_bridge_get(tunnel, addr); > if (entry) > raddr = entry->raddr; > rcu_read_unlock(); > > return raddr; > } > > > > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mardi 17 janvier 2012 à 09:04 +0100, Štefan Gula a écrit : > Dňa 17. januára 2012 5:47, Eric Dumazet <eric.dumazet@gmail.com> napísal/a: > > > > 2) You call ipgre_tap_bridge_fini() from ipgre_exit_net() and > > ipgre_init_net(), thats completely bogus if CONFIG_NET_NS=y > > > > Just remove the struct kmem_cache *ipgre_tap_bridge_cache > > and use instead kmalloc(sizeof(...))/kfree(ptr) instead. > > > As this is completely the same part of code from net/bridge/br_fdb.c, > can you give me a hint about how change that as I believe it should be > changed also there? Please dont copy code you dont understand :( bridge code is ok, but not yours, since you destroy the kmem_cache when a net namespace exits. Either you fix your code, either you change your memory allocations to mere kmalloc()/kfree() calls and dont care of a private kmem_cache you have to create and destroy. Since you ask a SLAB_HWCACHE_ALIGN, the SLUB allocator will anyway merge your kmem_cache with the standard one (kmalloc-64 I guess) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dňa 17. januára 2012 9:29, Eric Dumazet <eric.dumazet@gmail.com> napísal/a: > Le mardi 17 janvier 2012 à 09:04 +0100, Štefan Gula a écrit : >> Dňa 17. januára 2012 5:47, Eric Dumazet <eric.dumazet@gmail.com> napísal/a: >> > >> > 2) You call ipgre_tap_bridge_fini() from ipgre_exit_net() and >> > ipgre_init_net(), thats completely bogus if CONFIG_NET_NS=y >> > >> > Just remove the struct kmem_cache *ipgre_tap_bridge_cache >> > and use instead kmalloc(sizeof(...))/kfree(ptr) instead. >> > >> As this is completely the same part of code from net/bridge/br_fdb.c, >> can you give me a hint about how change that as I believe it should be >> changed also there? > > Please dont copy code you dont understand :( > > bridge code is ok, but not yours, since you destroy the kmem_cache when > a net namespace exits. > > Either you fix your code, either you change your memory allocations to > mere kmalloc()/kfree() calls and dont care of a private kmem_cache you > have to create and destroy. > > Since you ask a SLAB_HWCACHE_ALIGN, the SLUB allocator will anyway merge > your kmem_cache with the standard one (kmalloc-64 I guess) > > > ok maybe I am getting it wrong, but I am little bit stuck here. I recheck the original bridge code. The difference I recognize is that in bridge code function: br_fdb_init() and br_fdb_fini() are called from module init and module exit functions: br_init and br_deinit in my code they are called from functions: ipgre_init_net and ipgre_exit_net instead of: ipgre_init and ipgre_fini To be honest I am not so familiar enough with kernel structure that I see the difference on the first time. But I think that with your help it can be done easily. The main idea was to create hash-table that is used to determine the destination IPv4 address (part of the entry structure). That hash-table should be different per each gretap interface - I think that's the reason why I put those init and fini inside ipgre_init_net and ipgre_exit_net. Am I right that the placement of this calls is correct or not? If not where those calls should be placed? On the other hand I have no idea how to substitute those two function with a code that you are suggesting kmalloc()/kfree(). I would be glad if you can help me here by providing me example how to substitute those two functions with kmalloc/kfree for the future usage (I am more reverse engineer learner type of person than manuals reading one) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff -uprN -X linux-3.2.1-orig/Documentation/dontdiff linux-3.2.1-orig/include/net/ipip.h linux-3.2.1-my/include/net/ipip.h --- linux-3.2.1-orig/include/net/ipip.h 2012-01-12 20:42:45.000000000 +0100 +++ linux-3.2.1-my/include/net/ipip.h 2012-01-16 11:17:01.000000000 +0100 @@ -27,6 +27,14 @@ struct ip_tunnel { __u32 o_seqno; /* The last output seqno */ int hlen; /* Precalculated GRE header length */ int mlink; +#ifdef CONFIG_NET_IPGRE_BRIDGE +#define GRETAP_BR_HASH_BITS 8 +#define GRETAP_BR_HASH_SIZE (1 << GRETAP_BR_HASH_BITS) + struct hlist_head hash[GRETAP_BR_HASH_SIZE]; + spinlock_t hash_lock; + unsigned long ageing_time; + struct timer_list gc_timer; +#endif struct ip_tunnel_parm parms; diff -uprN -X linux-3.2.1-orig/Documentation/dontdiff linux-3.2.1-orig/net/ipv4/Kconfig linux-3.2.1-my/net/ipv4/Kconfig --- linux-3.2.1-orig/net/ipv4/Kconfig 2012-01-12 20:42:45.000000000 +0100 +++ linux-3.2.1-my/net/ipv4/Kconfig 2012-01-16 12:37:00.000000000 +0100 @@ -211,6 +211,15 @@ config NET_IPGRE_BROADCAST Network), but can be distributed all over the Internet. If you want to do that, say Y here and to "IP multicast routing" below. +config NET_IPGRE_BRIDGE + bool "IP: Ethernet over multipoint GRE over IP" + depends on IP_MULTICAST && NET_IPGRE && NET_IPGRE_BROADCAST + help + Allows you to use multipoint GRE VPN as virtual switch and interconnect + several L2 endpoints over L3 routed infrastructure. It is useful for + creating multipoint L2 VPNs which can be later used inside bridge + interfaces If you want to use. GRE multipoint L2 VPN feature say Y. + config IP_MROUTE bool "IP: multicast routing" depends on IP_MULTICAST diff -uprN -X linux-3.2.1-orig/Documentation/dontdiff linux-3.2.1-orig/net/ipv4/ip_gre.c linux-3.2.1-my/net/ipv4/ip_gre.c --- linux-3.2.1-orig/net/ipv4/ip_gre.c 2012-01-12 20:42:45.000000000 +0100 +++ linux-3.2.1-my/net/ipv4/ip_gre.c 2012-01-17 00:01:17.000000000 +0100 @@ -52,6 +52,11 @@ #include <net/ip6_route.h> #endif +#ifdef CONFIG_NET_IPGRE_BRIDGE +#include <linux/jhash.h> +#include <asm/unaligned.h> +#endif + /* Problems & solutions -------------------- @@ -134,6 +139,184 @@ struct ipgre_net { struct net_device *fb_tunnel_dev; }; +#ifdef CONFIG_NET_IPGRE_BRIDGE + /* + * This part of code includes codes to enable L2 ethernet + * switch virtualization over IP routed infrastructure with + * utilization of multicast capable endpoint using Ethernet + * over GRE + * + * Author: Stefan Gula + * Signed-off-by: Stefan Gula <steweg@gmail.com> + */ +struct ipgre_tap_bridge_entry { + struct hlist_node hlist; + __be32 raddr; + unsigned char addr[ETH_ALEN]; + struct net_device *dev; + struct rcu_head rcu; + unsigned long updated; +}; + +static struct kmem_cache *ipgre_tap_bridge_cache __read_mostly; +static u32 ipgre_salt __read_mostly; + +static int __net_init ipgre_tap_bridge_init(void) +{ + ipgre_tap_bridge_cache = kmem_cache_create("ipgre_tap_bridge_cache", + sizeof(struct ipgre_tap_bridge_entry), + 0, + SLAB_HWCACHE_ALIGN, NULL); + if (!ipgre_tap_bridge_cache) + return -ENOMEM; + get_random_bytes(&ipgre_salt, sizeof(ipgre_salt)); + return 0; +} + +static void ipgre_tap_bridge_fini(void) +{ + kmem_cache_destroy(ipgre_tap_bridge_cache); +} + +static inline int ipgre_tap_bridge_hash(const unsigned char *mac) +{ + u32 key = get_unaligned((u32 *)(mac + 2)); + return jhash_1word(key, ipgre_salt) & (GRETAP_BR_HASH_SIZE - 1); +} + +static inline int ipgre_tap_bridge_has_expired(const struct ip_tunnel *tunnel, + const struct ipgre_tap_bridge_entry *entry) +{ + return time_before_eq(entry->updated + tunnel->ageing_time, + jiffies); +} + +static inline void ipgre_tap_bridge_delete(struct ipgre_tap_bridge_entry *entry) +{ + hlist_del_rcu(&entry->hlist); + kfree_rcu(entry, rcu); +} + +static void ipgre_tap_bridge_cleanup(unsigned long _data) +{ + struct ip_tunnel *tunnel = (struct ip_tunnel *)_data; + unsigned long delay = tunnel->ageing_time; + unsigned long next_timer = jiffies + tunnel->ageing_time; + int i; + spin_lock(&tunnel->hash_lock); + for (i = 0; i < GRETAP_BR_HASH_SIZE; i++) { + struct ipgre_tap_bridge_entry *entry; + struct hlist_node *h, *n; + hlist_for_each_entry_safe(entry, h, n, + &tunnel->hash[i], hlist) + { + unsigned long this_timer; + this_timer = entry->updated + delay; + if (time_before_eq(this_timer, jiffies)) + ipgre_tap_bridge_delete(entry); + else if (time_before(this_timer, next_timer)) + next_timer = this_timer; + } + } + spin_unlock(&tunnel->hash_lock); + mod_timer(&tunnel->gc_timer, round_jiffies_up(next_timer)); +} + +static void ipgre_tap_bridge_flush(struct ip_tunnel *tunnel) +{ + int i; + spin_lock_bh(&tunnel->hash_lock); + for (i = 0; i < GRETAP_BR_HASH_SIZE; i++) { + struct ipgre_tap_bridge_entry *entry; + struct hlist_node *h, *n; + hlist_for_each_entry_safe(entry, h, n, + &tunnel->hash[i], hlist) + { + ipgre_tap_bridge_delete(entry); + } + } + spin_unlock_bh(&tunnel->hash_lock); +} + +static struct ipgre_tap_bridge_entry *__ipgre_tap_bridge_get( + struct ip_tunnel *tunnel, const unsigned char *addr) +{ + struct hlist_node *h; + struct ipgre_tap_bridge_entry *entry; + hlist_for_each_entry_rcu(entry, h, + &tunnel->hash[ipgre_tap_bridge_hash(addr)], hlist) + { + if (!compare_ether_addr(entry->addr, addr)) { + if (unlikely(ipgre_tap_bridge_has_expired(tunnel, + entry))) + break; + return entry; + } + } + + return NULL; +} + +static struct ipgre_tap_bridge_entry *ipgre_tap_bridge_find( + struct hlist_head *head, + const unsigned char *addr) +{ + struct hlist_node *h; + struct ipgre_tap_bridge_entry *entry; + hlist_for_each_entry(entry, h, head, hlist) { + if (!compare_ether_addr(entry->addr, addr)) + return entry; + } + return NULL; +} + + +static struct ipgre_tap_bridge_entry *ipgre_tap_bridge_find_rcu( + struct hlist_head *head, + const unsigned char *addr) +{ + struct hlist_node *h; + struct ipgre_tap_bridge_entry *entry; + hlist_for_each_entry_rcu(entry, h, head, hlist) { + if (!compare_ether_addr(entry->addr, addr)) + return entry; + } + return NULL; +} + +static struct ipgre_tap_bridge_entry *ipgre_tap_bridge_create( + struct hlist_head *head, + u32 source, + const unsigned char *addr, struct net_device *dev) +{ + struct ipgre_tap_bridge_entry *entry; + entry = kmem_cache_alloc(ipgre_tap_bridge_cache, GFP_ATOMIC); + if (entry) { + memcpy(entry->addr, addr, ETH_ALEN); + entry->raddr = source; + entry->dev = dev; + entry->updated = jiffies; + hlist_add_head_rcu(&entry->hlist, head); + } + return entry; +} + +static __be32 ipgre_tap_bridge_get_raddr(struct ip_tunnel *tunnel, + const unsigned char *addr) +{ + __be32 raddr; + struct ipgre_tap_bridge_entry *entry; + rcu_read_lock(); + entry = __ipgre_tap_bridge_get(tunnel, addr); + if (entry == NULL) + raddr = 0; + else + raddr = entry->raddr; + rcu_read_unlock(); + return raddr; +} + +#endif /* Tunnel hash table */