diff mbox series

usbnet: smsc95xx: simplify tx_fixup code

Message ID 20181002165602.21033-1-ben.dooks@codethink.co.uk
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series usbnet: smsc95xx: simplify tx_fixup code | expand

Commit Message

Ben Dooks Oct. 2, 2018, 4:56 p.m. UTC
The smsc95xx_tx_fixup is doing multiple calls to skb_push() to
put an 8-byte command header onto the packet. It would be easier
to do one skb_push() and then copy the data in once the push is
done.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
---
 drivers/net/usb/smsc95xx.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

Comments

David Laight Oct. 3, 2018, 1:36 p.m. UTC | #1
From: Ben Dooks
> Sent: 02 October 2018 17:56
> 
> The smsc95xx_tx_fixup is doing multiple calls to skb_push() to
> put an 8-byte command header onto the packet. It would be easier
> to do one skb_push() and then copy the data in once the push is
> done.
> 
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> ---
>  drivers/net/usb/smsc95xx.c | 25 +++++++++++++------------
>  1 file changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
> index cb19aea139d3..813ab93ee2c3 100644
> --- a/drivers/net/usb/smsc95xx.c
> +++ b/drivers/net/usb/smsc95xx.c
> @@ -2006,6 +2006,7 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
>  	bool csum = skb->ip_summed == CHECKSUM_PARTIAL;
>  	int overhead = csum ? SMSC95XX_TX_OVERHEAD_CSUM : SMSC95XX_TX_OVERHEAD;
>  	u32 tx_cmd_a, tx_cmd_b;
> +	void *ptr;

It might be useful to define a structure for the header.
You might need to find the 'store unaligned 32bit word' macro though.
(Actually that will probably be better than the memcpy() which might
end up doing memory-memory copies rather than storing the register.)
Although if/when you add the tx alignment that won't be needed because the
header will be aligned.

>  	/* We do not advertise SG, so skbs should be already linearized */
>  	BUG_ON(skb_shinfo(skb)->nr_frags);
> @@ -2019,6 +2020,9 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
>  		return NULL;
>  	}
> 
> +	tx_cmd_b = (u32)skb->len;
> +	tx_cmd_a = tx_cmd_b | TX_CMD_A_FIRST_SEG_ | TX_CMD_A_LAST_SEG_;
> +
>  	if (csum) {
>  		if (skb->len <= 45) {
>  			/* workaround - hardware tx checksum does not work
> @@ -2035,21 +2039,18 @@ static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
>  			skb_push(skb, 4);
>  			cpu_to_le32s(&csum_preamble);

Not related, but csum_preamble = cpu_to_le32(csum_preamble) is likely to
generate better code (at least for some architectures).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Ben Dooks Oct. 3, 2018, 4:25 p.m. UTC | #2
On 2018-10-03 14:36, David Laight wrote:
> From: Ben Dooks
>> Sent: 02 October 2018 17:56
>> 
>> The smsc95xx_tx_fixup is doing multiple calls to skb_push() to
>> put an 8-byte command header onto the packet. It would be easier
>> to do one skb_push() and then copy the data in once the push is
>> done.
>> 
>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>> ---
>>  drivers/net/usb/smsc95xx.c | 25 +++++++++++++------------
>>  1 file changed, 13 insertions(+), 12 deletions(-)
>> 
>> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
>> index cb19aea139d3..813ab93ee2c3 100644
>> --- a/drivers/net/usb/smsc95xx.c
>> +++ b/drivers/net/usb/smsc95xx.c
>> @@ -2006,6 +2006,7 @@ static struct sk_buff *smsc95xx_tx_fixup(struct 
>> usbnet *dev,
>>  	bool csum = skb->ip_summed == CHECKSUM_PARTIAL;
>>  	int overhead = csum ? SMSC95XX_TX_OVERHEAD_CSUM : 
>> SMSC95XX_TX_OVERHEAD;
>>  	u32 tx_cmd_a, tx_cmd_b;
>> +	void *ptr;
> 
> It might be useful to define a structure for the header.
> You might need to find the 'store unaligned 32bit word' macro though.
> (Actually that will probably be better than the memcpy() which might
> end up doing memory-memory copies rather than storing the register.)
> Although if/when you add the tx alignment that won't be needed because 
> the
> header will be aligned.

Ok, might be worth doing.

I did try to do a "u32 tx_cmd[2]" but the code generated ended up 
storing
stuff onto the stack before copying into the packet. I agree that 
possibly
going to the "put_unaligned" function might be nicer too.

If we did enable tx-align all the time then we'd not have to care about 
the
alignment, but I didn't want to do that if possible as that would end up
sending up to 3 bytes extra per packet.

I am trying not too do too many changes at one time to allow roll back.

>>  	/* We do not advertise SG, so skbs should be already linearized */
>>  	BUG_ON(skb_shinfo(skb)->nr_frags);
>> @@ -2019,6 +2020,9 @@ static struct sk_buff *smsc95xx_tx_fixup(struct 
>> usbnet *dev,
>>  		return NULL;
>>  	}
>> 
>> +	tx_cmd_b = (u32)skb->len;
>> +	tx_cmd_a = tx_cmd_b | TX_CMD_A_FIRST_SEG_ | TX_CMD_A_LAST_SEG_;
>> +
>>  	if (csum) {
>>  		if (skb->len <= 45) {
>>  			/* workaround - hardware tx checksum does not work
>> @@ -2035,21 +2039,18 @@ static struct sk_buff 
>> *smsc95xx_tx_fixup(struct usbnet *dev,
>>  			skb_push(skb, 4);
>>  			cpu_to_le32s(&csum_preamble);
> 
> Not related, but csum_preamble = cpu_to_le32(csum_preamble) is likely 
> to
> generate better code (at least for some architectures).
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> MK1 1PT, UK
> Registration No: 1397386 (Wales)
David Miller Oct. 5, 2018, 9:24 p.m. UTC | #3
From: Ben Dooks <ben.dooks@codethink.co.uk>
Date: Tue,  2 Oct 2018 17:56:02 +0100

> -	memcpy(skb->data, &tx_cmd_a, 4);
> +	ptr = skb_push(skb, 8);
> +	tx_cmd_a = cpu_to_le32(tx_cmd_a);
> +	tx_cmd_b = cpu_to_le32(tx_cmd_b);
> +	memcpy(ptr, &tx_cmd_a, 4);
> +	memcpy(ptr+4, &tx_cmd_b, 4);

Even a memcpy() through a void pointer does not guarantee that gcc will
not emit word sized loads and stores.

You must use the get_unaligned()/put_unaligned() facilities to do this
properly.

I also agree that making a proper type and structure instead of using
a void pointer would be better.
Ben Dooks Oct. 6, 2018, 11:27 a.m. UTC | #4
On 2018-10-05 22:24, David Miller wrote:
> From: Ben Dooks <ben.dooks@codethink.co.uk>
> Date: Tue,  2 Oct 2018 17:56:02 +0100
> 
>> -	memcpy(skb->data, &tx_cmd_a, 4);
>> +	ptr = skb_push(skb, 8);
>> +	tx_cmd_a = cpu_to_le32(tx_cmd_a);
>> +	tx_cmd_b = cpu_to_le32(tx_cmd_b);
>> +	memcpy(ptr, &tx_cmd_a, 4);
>> +	memcpy(ptr+4, &tx_cmd_b, 4);
> 
> Even a memcpy() through a void pointer does not guarantee that gcc will
> not emit word sized loads and stores.
> 
> You must use the get_unaligned()/put_unaligned() facilities to do this
> properly.

Thanks, got a new version of the series just being tested with this.
Should it go into the original, or as a separate change?

> 
> I also agree that making a proper type and structure instead of using
> a void pointer would be better.
David Miller Oct. 6, 2018, 5:28 p.m. UTC | #5
From: Ben Dooks <ben.dooks@codethink.co.uk>
Date: Sat, 06 Oct 2018 12:27:27 +0100

> Thanks, got a new version of the series just being tested with this.
> Should it go into the original, or as a separate change?

Into the original.
David Laight Oct. 8, 2018, 8:41 a.m. UTC | #6
From: David Miller
> Sent: 05 October 2018 22:24
> 
> From: Ben Dooks <ben.dooks@codethink.co.uk>
> Date: Tue,  2 Oct 2018 17:56:02 +0100
> 
> > -	memcpy(skb->data, &tx_cmd_a, 4);
> > +	ptr = skb_push(skb, 8);
> > +	tx_cmd_a = cpu_to_le32(tx_cmd_a);
> > +	tx_cmd_b = cpu_to_le32(tx_cmd_b);
> > +	memcpy(ptr, &tx_cmd_a, 4);
> > +	memcpy(ptr+4, &tx_cmd_b, 4);
> 
> Even a memcpy() through a void pointer does not guarantee that gcc will
> not emit word sized loads and stores.

True, but only if gcc can 'see' something that would require the
pointer be aligned.
In this case the void pointer comes from an external function
so is fine.

> You must use the get_unaligned()/put_unaligned() facilities to do this
> properly.
> 
> I also agree that making a proper type and structure instead of using
> a void pointer would be better.

The structure would need to be marked 'packed' - since its alignment
isn't guaranteed.
Then you don't need to use put_unaligned().

If it wasn't 'packed' then gcc would implement
memcpy(&hdr->tx_cmd_a, &tx_cmd_a, 4) using an aligned write.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index cb19aea139d3..813ab93ee2c3 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -2006,6 +2006,7 @@  static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
 	bool csum = skb->ip_summed == CHECKSUM_PARTIAL;
 	int overhead = csum ? SMSC95XX_TX_OVERHEAD_CSUM : SMSC95XX_TX_OVERHEAD;
 	u32 tx_cmd_a, tx_cmd_b;
+	void *ptr;
 
 	/* We do not advertise SG, so skbs should be already linearized */
 	BUG_ON(skb_shinfo(skb)->nr_frags);
@@ -2019,6 +2020,9 @@  static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
 		return NULL;
 	}
 
+	tx_cmd_b = (u32)skb->len;
+	tx_cmd_a = tx_cmd_b | TX_CMD_A_FIRST_SEG_ | TX_CMD_A_LAST_SEG_;
+
 	if (csum) {
 		if (skb->len <= 45) {
 			/* workaround - hardware tx checksum does not work
@@ -2035,21 +2039,18 @@  static struct sk_buff *smsc95xx_tx_fixup(struct usbnet *dev,
 			skb_push(skb, 4);
 			cpu_to_le32s(&csum_preamble);
 			memcpy(skb->data, &csum_preamble, 4);
+
+			tx_cmd_a += 4;
+			tx_cmd_b += 4;
+			tx_cmd_b |= TX_CMD_B_CSUM_ENABLE;
 		}
 	}
 
-	skb_push(skb, 4);
-	tx_cmd_b = (u32)(skb->len - 4);
-	if (csum)
-		tx_cmd_b |= TX_CMD_B_CSUM_ENABLE;
-	cpu_to_le32s(&tx_cmd_b);
-	memcpy(skb->data, &tx_cmd_b, 4);
-
-	skb_push(skb, 4);
-	tx_cmd_a = (u32)(skb->len - 8) | TX_CMD_A_FIRST_SEG_ |
-		TX_CMD_A_LAST_SEG_;
-	cpu_to_le32s(&tx_cmd_a);
-	memcpy(skb->data, &tx_cmd_a, 4);
+	ptr = skb_push(skb, 8);
+	tx_cmd_a = cpu_to_le32(tx_cmd_a);
+	tx_cmd_b = cpu_to_le32(tx_cmd_b);
+	memcpy(ptr, &tx_cmd_a, 4);
+	memcpy(ptr+4, &tx_cmd_b, 4);
 
 	return skb;
 }