Message ID | 49F71B63.8010503@cosmosbay.com |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
For the udpping test load these patches have barely any effect: git2p1 is the first version of the patch git2p2 is the second version (this one) Same CPU Kernel Test 1 Test 2 Test 3 Test 4 Average 2.6.22 83.03 82.9 82.89 82.92 82.935 2.6.23 83.35 82.81 82.83 82.86 82.9625 2.6.24 82.66 82.56 82.64 82.73 82.6475 2.6.25 84.28 84.29 84.37 84.3 84.31 2.6.26 84.72 84.38 84.41 84.68 84.5475 2.6.27 84.56 84.44 84.41 84.58 84.4975 2.6.28 84.7 84.43 84.47 84.48 84.52 2.6.29 84.91 84.67 84.69 84.75 84.755 2.6.30-rc2 84.94 84.72 84.69 84.93 84.82 2.6.30-rc3 84.88 84.7 84.73 84.89 84.8 2.6.30-rc3-git2p1 84.89 84.77 84.79 84.85 84.825 2.6.30-rc3-git2p2 84.91 84.79 84.78 84.8 84.82 Same Core Kernel Test 1 Test 2 Test 3 Test 4 Average 2.6.22 84.6 84.71 84.52 84.53 84.59 2.6.23 84.59 84.5 84.33 84.34 84.44 2.6.24 84.28 84.3 84.38 84.28 84.31 2.6.25 86.12 85.8 86.2 86.04 86.04 2.6.26 86.61 86.46 86.49 86.7 86.565 2.6.27 87 87.01 87 86.95 86.99 2.6.28 86.53 86.44 86.26 86.24 86.3675 2.6.29 85.88 85.94 86.1 85.69 85.9025 2.6.30-rc2 86.03 85.93 85.99 86.06 86.0025 2.6.30-rc3 85.73 85.88 85.67 85.94 85.805 2.6.30-rc3-git2p1 86.11 85.8 86.03 85.92 85.965 2.6.30-rc3-git2p2 86.04 85.96 85.89 86.04 85.9825 Same Socket Kernel Test 1 Test 2 Test 3 Test 4 Average 2.6.22 90.08 89.72 90 89.9 89.925 2.6.23 89.72 90.1 89.99 89.86 89.9175 2.6.24 89.18 89.28 89.25 89.22 89.2325 2.6.25 90.83 90.78 90.87 90.61 90.7725 2.6.26 90.51 91.25 91.8 91.69 91.3125 2.6.27 91.98 91.93 91.97 91.91 91.9475 2.6.28 91.72 91.7 91.84 91.75 91.7525 2.6.29 89.85 89.85 90.14 89.9 89.935 2.6.30-rc2 90.78 90.8 90.87 90.73 90.795 2.6.30-rc3 90.84 90.94 91.05 90.84 90.9175 2.6.30-rc3-git2p1 90.87 90.95 90.86 90.92 90.9 2.6.30-rc3-git2p2 91.09 91.01 90.97 91.06 91.0325 Different Socket Kernel Test 1 Test 2 Test 3 Test 4 Average 2.6.22 91.64 91.65 91.61 91.68 91.645 2.6.23 91.9 91.84 91.92 91.83 91.873 2.6.24 91.33 91.24 91.42 91.38 91.343 2.6.25 92.39 92.04 92.3 92.23 92.240 2.6.26 90.64 90.57 90.6 90.08 90.473 2.6.27 91.14 91.26 90.9 91.09 91.098 2.6.28 92.3 91.92 92.3 92.23 92.188 2.6.29 90.57 89.83 89.9 90.41 90.178 2.6.30-rc2 90.59 90.97 90.27 91.69 90.880 2.6.30-rc3 92.08 91.32 91.21 92.06 91.668 2.6.30-rc3-git2p1 91.46 91.38 91.92 91.03 91.448 2.6.30-rc3-git2p2 91.39 90.47 90.03 90.62 90.628 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Lameter a écrit : > For the udpping test load these patches have barely any effect: > > git2p1 is the first version of the patch > git2p2 is the second version (this one) But... udpping does *not* use poll() nor select(), unless I am mistaken ? If you really want to test this patch with udpping, you might add a poll() call before recvfrom() : while(1) { + struct pollfd pfd = { .fd = sock, .events = POLLIN}; + poll(pfd, 1, -1); nbytes = recvfrom(sock, msg, min(inblocksize, sizeof(msg)), 0, &inad, &inadlen); if (nbytes < 0) { perror("recvfrom"); break; } if (sendto(sock, msg, nbytes, 0, &inad, inadlen) < 0) { perror("sendto"); break; } } Part about recvfrom() wakeup avoidance is in David net-2.6 tree, and saves 2 us on udpping here. Is it what you call git2p1 ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 28 Apr 2009, Eric Dumazet wrote: > Part about recvfrom() wakeup avoidance is in David net-2.6 tree, and saves 2 us on udpping here. > > Is it what you call git2p1 ? No that is just the prior version of the poll/select improvements. Which patch are you referring to? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 28 Apr 2009 17:06:11 +0200 Eric Dumazet <dada1@cosmosbay.com> wrote: > [PATCH] poll: Avoid extra wakeups in select/poll > > After introduction of keyed wakeups Davide Libenzi did on epoll, we > are able to avoid spurious wakeups in poll()/select() code too. > > For example, typical use of poll()/select() is to wait for incoming > network frames on many sockets. But TX completion for UDP/TCP > frames call sock_wfree() which in turn schedules thread. > > When scheduled, thread does a full scan of all polled fds and > can sleep again, because nothing is really available. If number > of fds is large, this cause significant load. > > This patch makes select()/poll() aware of keyed wakeups and > useless wakeups are avoided. This reduces number of context > switches by about 50% on some setups, and work performed > by sofirq handlers. > Seems that this is a virtuous patch even though Christoph is struggling a bit to test it? > fs/select.c | 28 +++++++++++++++++++++++++--- > include/linux/poll.h | 3 +++ > 2 files changed, 28 insertions(+), 3 deletions(-) > > diff --git a/fs/select.c b/fs/select.c > index 0fe0e14..2708187 100644 > --- a/fs/select.c > +++ b/fs/select.c > @@ -168,7 +168,7 @@ static struct poll_table_entry *poll_get_entry(struct poll_wqueues *p) > return table->entry++; > } > > -static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key) > +static int __pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key) > { > struct poll_wqueues *pwq = wait->private; > DECLARE_WAITQUEUE(dummy_wait, pwq->polling_task); > @@ -194,6 +194,16 @@ static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key) > return default_wake_function(&dummy_wait, mode, sync, key); > } > > +static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key) > +{ > + struct poll_table_entry *entry; > + > + entry = container_of(wait, struct poll_table_entry, wait); > + if (key && !((unsigned long)key & entry->key)) > + return 0; > + return __pollwake(wait, mode, sync, key); > +} > + > /* Add a new entry */ > static void __pollwait(struct file *filp, wait_queue_head_t *wait_address, > poll_table *p) > @@ -205,6 +215,7 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address, > get_file(filp); > entry->filp = filp; > entry->wait_address = wait_address; > + entry->key = p->key; > init_waitqueue_func_entry(&entry->wait, pollwake); > entry->wait.private = pwq; > add_wait_queue(wait_address, &entry->wait); > @@ -418,8 +429,16 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time) > if (file) { > f_op = file->f_op; > mask = DEFAULT_POLLMASK; > - if (f_op && f_op->poll) > + if (f_op && f_op->poll) { > + if (wait) { > + wait->key = POLLEX_SET; > + if (in & bit) > + wait->key |= POLLIN_SET; > + if (out & bit) > + wait->key |= POLLOUT_SET; > + } > mask = (*f_op->poll)(file, retval ? NULL : wait); > + } <resizes xterm rather a lot> Can we (and should we) avoid all that manipulation of wait->key if `retval' is zero? > --- a/include/linux/poll.h > +++ b/include/linux/poll.h > @@ -32,6 +32,7 @@ typedef void (*poll_queue_proc)(struct file *, wait_queue_head_t *, struct poll_ > > typedef struct poll_table_struct { > poll_queue_proc qproc; > + unsigned long key; > } poll_table; > > static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p) > @@ -43,10 +44,12 @@ static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_addres > static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc) > { > pt->qproc = qproc; > + pt->key = ~0UL; /* all events enabled */ I kind of prefer to use plain old -1 for the all-ones pattern. Because it always just works, and doesn't send the reviewer off to check if the type was really u64 or something. It's a bit ugly though. > } > > struct poll_table_entry { > struct file *filp; > + unsigned long key; > wait_queue_t wait; > wait_queue_head_t *wait_address; > }; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Seems that this is a virtuous patch even though Christoph is struggling > a bit to test it? The main drawback is that the select/poll data structures will get larger. That could cause regression in theory. But I suspect the win in some situations is still worth it. Of course it would be nice if it handled more situations (like multiple reader etc.) -Andi -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andi Kleen a écrit : >> Seems that this is a virtuous patch even though Christoph is struggling >> a bit to test it? > > The main drawback is that the select/poll data structures will get > larger. That could cause regression in theory. But I suspect > the win in some situations is still worth it. Of course > it would be nice if it handled more situations (like > multiple reader etc.) On poll()/select() interface, we must wakeup every pollers, because we dont know if they really will consume the event thread 1: poll(); <insert an exit() or something bad here> read(); thread 2: poll(); /* no return because event was 'granted' to thread 1 */ read(); We could try to optimize read()/recvfrom() because we can know if event is consumed, as its a whole syscall. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andrew Morton a écrit : > On Tue, 28 Apr 2009 17:06:11 +0200 Eric Dumazet <dada1@cosmosbay.com> wrote: > >> [PATCH] poll: Avoid extra wakeups in select/poll >> >> After introduction of keyed wakeups Davide Libenzi did on epoll, we >> are able to avoid spurious wakeups in poll()/select() code too. >> >> For example, typical use of poll()/select() is to wait for incoming >> network frames on many sockets. But TX completion for UDP/TCP >> frames call sock_wfree() which in turn schedules thread. >> >> When scheduled, thread does a full scan of all polled fds and >> can sleep again, because nothing is really available. If number >> of fds is large, this cause significant load. >> >> This patch makes select()/poll() aware of keyed wakeups and >> useless wakeups are avoided. This reduces number of context >> switches by about 50% on some setups, and work performed >> by sofirq handlers. >> > > Seems that this is a virtuous patch even though Christoph is struggling > a bit to test it? > >> fs/select.c | 28 +++++++++++++++++++++++++--- >> include/linux/poll.h | 3 +++ >> 2 files changed, 28 insertions(+), 3 deletions(-) >> >> diff --git a/fs/select.c b/fs/select.c >> index 0fe0e14..2708187 100644 >> --- a/fs/select.c >> +++ b/fs/select.c >> @@ -168,7 +168,7 @@ static struct poll_table_entry *poll_get_entry(struct poll_wqueues *p) >> return table->entry++; >> } >> >> -static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key) >> +static int __pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key) >> { >> struct poll_wqueues *pwq = wait->private; >> DECLARE_WAITQUEUE(dummy_wait, pwq->polling_task); >> @@ -194,6 +194,16 @@ static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key) >> return default_wake_function(&dummy_wait, mode, sync, key); >> } >> >> +static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key) >> +{ >> + struct poll_table_entry *entry; >> + >> + entry = container_of(wait, struct poll_table_entry, wait); >> + if (key && !((unsigned long)key & entry->key)) >> + return 0; >> + return __pollwake(wait, mode, sync, key); >> +} >> + >> /* Add a new entry */ >> static void __pollwait(struct file *filp, wait_queue_head_t *wait_address, >> poll_table *p) >> @@ -205,6 +215,7 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address, >> get_file(filp); >> entry->filp = filp; >> entry->wait_address = wait_address; >> + entry->key = p->key; >> init_waitqueue_func_entry(&entry->wait, pollwake); >> entry->wait.private = pwq; >> add_wait_queue(wait_address, &entry->wait); >> @@ -418,8 +429,16 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time) >> if (file) { >> f_op = file->f_op; >> mask = DEFAULT_POLLMASK; >> - if (f_op && f_op->poll) >> + if (f_op && f_op->poll) { >> + if (wait) { >> + wait->key = POLLEX_SET; >> + if (in & bit) >> + wait->key |= POLLIN_SET; >> + if (out & bit) >> + wait->key |= POLLOUT_SET; >> + } >> mask = (*f_op->poll)(file, retval ? NULL : wait); >> + } > > <resizes xterm rather a lot> > > Can we (and should we) avoid all that manipulation of wait->key if > `retval' is zero? yes, we could set wait to NULL as soon as retval is incremented. and also do : mask = (*f_op->poll)(file, wait); > >> --- a/include/linux/poll.h >> +++ b/include/linux/poll.h >> @@ -32,6 +32,7 @@ typedef void (*poll_queue_proc)(struct file *, wait_queue_head_t *, struct poll_ >> >> typedef struct poll_table_struct { >> poll_queue_proc qproc; >> + unsigned long key; >> } poll_table; >> >> static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p) >> @@ -43,10 +44,12 @@ static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_addres >> static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc) >> { >> pt->qproc = qproc; >> + pt->key = ~0UL; /* all events enabled */ > > I kind of prefer to use plain old -1 for the all-ones pattern. Because > it always just works, and doesn't send the reviewer off to check if the > type was really u64 or something. > > It's a bit ugly though. > >> } >> >> struct poll_table_entry { >> struct file *filp; >> + unsigned long key; >> wait_queue_t wait; >> wait_queue_head_t *wait_address; >> }; > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Eric Dumazet <dada1@cosmosbay.com> wrote: > @@ -418,8 +429,16 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time) > if (file) { > f_op = file->f_op; > mask = DEFAULT_POLLMASK; > - if (f_op && f_op->poll) > + if (f_op && f_op->poll) { > + if (wait) { > + wait->key = POLLEX_SET; > + if (in & bit) > + wait->key |= POLLIN_SET; > + if (out & bit) > + wait->key |= POLLOUT_SET; > + } > mask = (*f_op->poll)(file, retval ? NULL : wait); > + } > fput_light(file, fput_needed); > if ((mask & POLLIN_SET) && (in & bit)) { > res_in |= bit; Please factor this whole 'if (file)' branch out into a helper. Typical indentation levels go from 1 to 3 tabs - 4 should be avoided if possible and 5 is pretty excessive already. This goes to eight. Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Andi Kleen <andi@firstfloor.org> wrote: > > Seems that this is a virtuous patch even though Christoph is struggling > > a bit to test it? > > The main drawback is that the select/poll data structures will get > larger. That could cause regression in theory. [...] Current size of struct poll_table_entry is 0x38 on 64-bit kernels. Adding the key will make it 0x40 - which is not only a power of two but also matches cache line size on most modern CPUs. So the size of this structure is ideal now and arithmetics on the poll table have become simpler as well. So the patch has my ack: Acked-by: Ingo Molnar <mingo@elte.hu> Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/select.c b/fs/select.c index 0fe0e14..2708187 100644 --- a/fs/select.c +++ b/fs/select.c @@ -168,7 +168,7 @@ static struct poll_table_entry *poll_get_entry(struct poll_wqueues *p) return table->entry++; } -static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key) +static int __pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key) { struct poll_wqueues *pwq = wait->private; DECLARE_WAITQUEUE(dummy_wait, pwq->polling_task); @@ -194,6 +194,16 @@ static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key) return default_wake_function(&dummy_wait, mode, sync, key); } +static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key) +{ + struct poll_table_entry *entry; + + entry = container_of(wait, struct poll_table_entry, wait); + if (key && !((unsigned long)key & entry->key)) + return 0; + return __pollwake(wait, mode, sync, key); +} + /* Add a new entry */ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address, poll_table *p) @@ -205,6 +215,7 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address, get_file(filp); entry->filp = filp; entry->wait_address = wait_address; + entry->key = p->key; init_waitqueue_func_entry(&entry->wait, pollwake); entry->wait.private = pwq; add_wait_queue(wait_address, &entry->wait); @@ -418,8 +429,16 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time) if (file) { f_op = file->f_op; mask = DEFAULT_POLLMASK; - if (f_op && f_op->poll) + if (f_op && f_op->poll) { + if (wait) { + wait->key = POLLEX_SET; + if (in & bit) + wait->key |= POLLIN_SET; + if (out & bit) + wait->key |= POLLOUT_SET; + } mask = (*f_op->poll)(file, retval ? NULL : wait); + } fput_light(file, fput_needed); if ((mask & POLLIN_SET) && (in & bit)) { res_in |= bit; @@ -685,8 +704,11 @@ static inline unsigned int do_pollfd(struct pollfd *pollfd, poll_table *pwait) mask = POLLNVAL; if (file != NULL) { mask = DEFAULT_POLLMASK; - if (file->f_op && file->f_op->poll) + if (file->f_op && file->f_op->poll) { + if (pwait) + pwait->key = pollfd->events | POLLERR | POLLHUP; mask = file->f_op->poll(file, pwait); + } /* Mask out unneeded events. */ mask &= pollfd->events | POLLERR | POLLHUP; fput_light(file, fput_needed); diff --git a/include/linux/poll.h b/include/linux/poll.h index 8c24ef8..3327389 100644 --- a/include/linux/poll.h +++ b/include/linux/poll.h @@ -32,6 +32,7 @@ typedef void (*poll_queue_proc)(struct file *, wait_queue_head_t *, struct poll_ typedef struct poll_table_struct { poll_queue_proc qproc; + unsigned long key; } poll_table; static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p) @@ -43,10 +44,12 @@ static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_addres static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc) { pt->qproc = qproc; + pt->key = ~0UL; /* all events enabled */ } struct poll_table_entry { struct file *filp; + unsigned long key; wait_queue_t wait; wait_queue_head_t *wait_address; };