[v2] can: c_can: Speed up rx_poll function

Message ID	1383035267-19604-1-git-send-email-mpa@pengutronix.de
State	Changes Requested, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Markus Pargmann <mpa@pengutronix.de> To: Marc Kleine-Budde <mkl@pengutronix.de>, Wolfgang Grandegger <wg@grandegger.com> Cc: Joe Perches <joe@perches.com>, linux-can@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, kernel@pengutronix.de, Markus Pargmann <mpa@pengutronix.de> Subject: [PATCH v2] can: c_can: Speed up rx_poll function Date: Tue, 29 Oct 2013 09:27:47 +0100 Message-Id: <1383035267-19604-1-git-send-email-mpa@pengutronix.de> Sender: netdev-owner@vger.kernel.org Precedence: bulk

Message ID

1383035267-19604-1-git-send-email-mpa@pengutronix.de

State

Changes Requested, archived

Delegated to:

David Miller

Headers

From: Markus Pargmann <mpa@pengutronix.de>
To: Marc Kleine-Budde <mkl@pengutronix.de>,
	Wolfgang Grandegger <wg@grandegger.com>
Cc: Joe Perches <joe@perches.com>, linux-can@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel@pengutronix.de, Markus Pargmann <mpa@pengutronix.de>
Subject: [PATCH v2] can: c_can: Speed up rx_poll function
Date: Tue, 29 Oct 2013 09:27:47 +0100
Message-Id: <1383035267-19604-1-git-send-email-mpa@pengutronix.de>
Sender: netdev-owner@vger.kernel.org
Precedence: bulk

Commit Message

Markus Pargmann Oct. 29, 2013, 8:27 a.m. UTC

This patch speeds up the rx_poll function by reducing the number of
register reads.

Replace the 32bit register read by a 16bit register read. Currently
the 32bit register read is implemented by using 2 16bit reads. This is
inefficient as we only use the lower 16bit in rx_poll.

The for loop reads the pending interrupts in every iteration. This
leads up to 16 reads of pending interrupts. The patch introduces a new
outer loop to read the pending interrupts as long as 'quota' is above 0.
This reduces the total number of reads.

The third change is to replace the for-loop by a find_next_bit loop.

Tested on AM335x. I removed all 'static' and 'inline' from c_can.c to
see the timings for all functions. I used the function tracer with
trace_stats.

125kbit:
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  c_can_do_rx_poll                     63960    10168178 us     158.977 us      1493056 us
With patch:
  c_can_do_rx_poll                     63939    4268457 us     66.758 us       818790.9 us

1Mbit:
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  c_can_do_rx_poll                     69489    30049498 us     432.435 us      9271851 us
With patch:
  c_can_do_rx_poll                    103034    24220362 us     235.071 us      6016656 us

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
---

Notes:
    Changes in v2:
     - Small changes, find_next_bit -> ffs and other

 drivers/net/can/c_can/c_can.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

Comments

Joe Perches Oct. 29, 2013, 8:31 a.m. UTC | #1

On Tue, 2013-10-29 at 09:27 +0100, Markus Pargmann wrote:
> This patch speeds up the rx_poll function by reducing the number of
> register reads.
[]
> The third change is to replace the for-loop by a find_next_bit loop.

You need to update the commit message.

> diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
[]
> @@ -798,17 +798,19 @@ static int c_can_do_rx_poll(struct net_device *dev, int quota)
[]
> +	while (quota > 0 && (val = priv->read_reg(priv, C_CAN_INTPND1_REG))) {
> +		while ((msg_obj = ffs(val)) && quota > 0) {
> +			val &= ~BIT(msg_obj - 1);



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Markus Pargmann Oct. 29, 2013, 8:58 a.m. UTC | #2

On Tue, Oct 29, 2013 at 01:34:48AM -0700, Joe Perches wrote:
> On Tue, 2013-10-29 at 09:27 +0100, Markus Pargmann wrote:
> > This patch speeds up the rx_poll function by reducing the number of
> > register reads.
> []
> > 125kbit:
> >   Function                               Hit    Time            Avg             s^2
> >   --------                               ---    ----            ---             ---
> >   c_can_do_rx_poll                     63960    10168178 us     158.977 us      1493056 us
> > With patch:
> >   c_can_do_rx_poll                     63939    4268457 us     66.758 us       818790.9 us
> > 
> > 1Mbit:
> >   Function                               Hit    Time            Avg             s^2
> >   --------                               ---    ----            ---             ---
> >   c_can_do_rx_poll                     69489    30049498 us     432.435 us      9271851 us
> > With patch:
> >   c_can_do_rx_poll                    103034    24220362 us     235.071 us      6016656 us
> 
> Also nicer if you updated the timings table
> 
> 

Yes I just measured the timings again:

./perf_can_test.sh 125000 30
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  c_can_poll                          127882    6178806 us     48.316 us       4393411 us 
  c_can_do_rx_poll                     63941    3764057 us     58.867 us       776162.2 us 
  c_can_enable_all_interrupts         255764    2213697 us     8.655 us        1093934 us 
  c_can_plat_write_reg_aligned_t      807252    1607115 us     1.990 us        10053684 us 
  c_can_isr                           127882    1220001 us     9.540 us        789.495 us  
  c_can_object_put                    119887    1039222 us     8.668 us        1608.668 us 
  c_can_plat_read_reg_aligned_to     1015072    1033283 us     1.017 us        7021.465 us 
  c_can_read_msg_object                63943    1026159 us     16.048 us       718.464 us  
  c_can_activate_all_lower_rx_ms        7992    755782.4 us     94.567 us       55.270 us   
  c_can_mark_rx_msg_obj                55951    709072.1 us     12.673 us       39.974 us   
  c_can_object_get                     63943    555669.2 us     8.690 us        96.211 us   
  c_can_msg_obj_is_busy               183830    527826.1 us     2.871 us        7289.221 us 
  alloc_can_skb                        63943    170966.6 us     2.673 us        165.765 us  
  c_can_has_and_handle_berr            63941    47063.18 us     0.736 us        2.757 us    

./perf_can_test.sh 1000000 30
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  c_can_poll                          270678    30290751 us     111.906 us      5809627 us 
  c_can_do_rx_poll                    207109    24322185 us     117.436 us      171469047 us 
  c_can_object_put                    841431    7278794 us     8.650 us        305841.0 us 
  c_can_plat_write_reg_aligned_t     4037180    6244636 us     1.546 us        4581066 us 
  c_can_read_msg_object               453988    6033464 us     13.289 us       128809.6 us 
  c_can_enable_all_interrupts         541342    5849826 us     10.806 us       22458900 us 
  c_can_activate_all_lower_rx_ms       55349    5237761 us     94.631 us       19004.79 us 
  c_can_mark_rx_msg_obj               387429    4865632 us     12.558 us       380606.4 us 
  c_can_plat_read_reg_aligned_to     4597629    4633247 us     1.007 us        315286.2 us 
  c_can_object_get                    453988    3919692 us     8.633 us        59847.76 us 
  c_can_msg_obj_is_busy              1295419    3706862 us     2.861 us        316655.7 us 
  c_can_isr                           270671    2496734 us     9.224 us        530.967 us  
  alloc_can_skb                       453988    856917.4 us     1.887 us        18157.68 us 
  c_can_activate_rx_msg_obj            11210    141177.4 us     12.593 us       123.068 us  
  c_can_has_and_handle_berr            63569    44995.85 us     0.707 us        12.780 us

It is interesting that the number of hits for c_can_do_rx_poll is twice as much
as it was with find_next_bit. Unfortunately this reduces the overall benefit of
this patch. Any ideas how to increase the number of waiting message objects we
handle in one poll call?


Regards,

Markus

Joe Perches Oct. 29, 2013, 4:24 p.m. UTC | #3

On Tue, 2013-10-29 at 09:58 +0100, Markus Pargmann wrote:
> On Tue, Oct 29, 2013 at 01:34:48AM -0700, Joe Perches wrote:
> > On Tue, 2013-10-29 at 09:27 +0100, Markus Pargmann wrote:
> > > This patch speeds up the rx_poll function by reducing the number of
> > > register reads.
> > []
> > > 125kbit:
> > >   Function                               Hit    Time            Avg             s^2
> > >   --------                               ---    ----            ---             ---
> > >   c_can_do_rx_poll                     63960    10168178 us     158.977 us      1493056 us
> > > With patch:
> > >   c_can_do_rx_poll                     63939    4268457 us     66.758 us       818790.9 us
> > > 
> > > 1Mbit:
> > >   Function                               Hit    Time            Avg             s^2
> > >   --------                               ---    ----            ---             ---
> > >   c_can_do_rx_poll                     69489    30049498 us     432.435 us      9271851 us
> > > With patch:
> > >   c_can_do_rx_poll                    103034    24220362 us     235.071 us      6016656 us
[]
> Yes I just measured the timings again:
[]
> ./perf_can_test.sh 125000 30
[]
>   c_can_do_rx_poll                     63941    3764057 us     58.867 us       776162.2 us 

Good, it's slightly faster still.

> ./perf_can_test.sh 1000000 30
[]
>   c_can_do_rx_poll                    207109    24322185 us     117.436 us      171469047 us 
[]
> It is interesting that the number of hits for c_can_do_rx_poll is twice as much
> as it was with find_next_bit.

How is this possible?  Any idea?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Markus Pargmann Oct. 30, 2013, 7:54 a.m. UTC | #4

On Tue, Oct 29, 2013 at 09:24:05AM -0700, Joe Perches wrote:
> On Tue, 2013-10-29 at 09:58 +0100, Markus Pargmann wrote:
> > On Tue, Oct 29, 2013 at 01:34:48AM -0700, Joe Perches wrote:
> > > On Tue, 2013-10-29 at 09:27 +0100, Markus Pargmann wrote:
> > > > This patch speeds up the rx_poll function by reducing the number of
> > > > register reads.
> > > []
> > > > 125kbit:
> > > >   Function                               Hit    Time            Avg             s^2
> > > >   --------                               ---    ----            ---             ---
> > > >   c_can_do_rx_poll                     63960    10168178 us     158.977 us      1493056 us
> > > > With patch:
> > > >   c_can_do_rx_poll                     63939    4268457 us     66.758 us       818790.9 us
> > > > 
> > > > 1Mbit:
> > > >   Function                               Hit    Time            Avg             s^2
> > > >   --------                               ---    ----            ---             ---
> > > >   c_can_do_rx_poll                     69489    30049498 us     432.435 us      9271851 us
> > > > With patch:
> > > >   c_can_do_rx_poll                    103034    24220362 us     235.071 us      6016656 us
> []
> > Yes I just measured the timings again:
> []
> > ./perf_can_test.sh 125000 30
> []
> >   c_can_do_rx_poll                     63941    3764057 us     58.867 us       776162.2 us 
> 
> Good, it's slightly faster still.
> 
> > ./perf_can_test.sh 1000000 30
> []
> >   c_can_do_rx_poll                    207109    24322185 us     117.436 us      171469047 us 
> []
> > It is interesting that the number of hits for c_can_do_rx_poll is twice as much
> > as it was with find_next_bit.
> 
> How is this possible?  Any idea?

Perhaps the can core received more messages in the previous patch
version while being in the poll function. This way it can handle more
messages without a new interrupt/poll call.

The new patch is faster so it does not receive as many new messages as
the old one, leading to more interrupts and less handled messages per
poll call.

Regards,

Markus Pargmann

diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
index a668cd4..428681e 100644
--- a/drivers/net/can/c_can/c_can.c
+++ b/drivers/net/can/c_can/c_can.c
@@ -798,17 +798,19 @@  static int c_can_do_rx_poll(struct net_device *dev, int quota)
 	u32 num_rx_pkts = 0;
 	unsigned int msg_obj, msg_ctrl_save;
 	struct c_can_priv *priv = netdev_priv(dev);
-	u32 val = c_can_read_reg32(priv, C_CAN_INTPND1_REG);
+	u16 val;
+
+	/*
+	 * It is faster to read only one 16bit register. This is only possible
+	 * for a maximum number of 16 objects.
+	 */
+	BUILD_BUG_ON_MSG(C_CAN_MSG_OBJ_RX_LAST > 16,
+			"Implementation does not support more message objects than 16");
+
+	while (quota > 0 && (val = priv->read_reg(priv, C_CAN_INTPND1_REG))) {
+		while ((msg_obj = ffs(val)) && quota > 0) {
+			val &= ~BIT(msg_obj - 1);
 
-	for (msg_obj = C_CAN_MSG_OBJ_RX_FIRST;
-			msg_obj <= C_CAN_MSG_OBJ_RX_LAST && quota > 0;
-			val = c_can_read_reg32(priv, C_CAN_INTPND1_REG),
-			msg_obj++) {
-		/*
-		 * as interrupt pending register's bit n-1 corresponds to
-		 * message object n, we need to handle the same properly.
-		 */
-		if (val & (1 << (msg_obj - 1))) {
 			c_can_object_get(dev, 0, msg_obj, IF_COMM_ALL &
 					~IF_COMM_TXRQST);
 			msg_ctrl_save = priv->read_reg(priv,

[v2] can: c_can: Speed up rx_poll function

Commit Message

Comments

Patch