diff mbox

[net-next] xen-netback: Grant copy the header instead of map and memcpy

Message ID 1395868707-22683-1-git-send-email-zoltan.kiss@citrix.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Zoltan Kiss March 26, 2014, 9:18 p.m. UTC
An old inefficiency of the TX path that we are grant mapping the first slot,
and then copy the header part to the linear area. Instead, doing a grant copy
for that header straight on is more reasonable. Especially because there are
ongoing efforts to make Xen avoiding TLB flush after unmap when the page were
not touched in Dom0. In the original way the memcpy ruined that.
The key changes:
- the vif has a tx_copy_ops array again
- xenvif_tx_build_gops sets up the grant copy operations
- we don't have to figure out whether the header and first frag are on the same
  grant mapped page or not

Unrelated, but I've made a small refactoring in xenvif_get_extras as well.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h  |    1 +
 drivers/net/xen-netback/netback.c |  100 ++++++++++++++++++-------------------
 2 files changed, 49 insertions(+), 52 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ian Campbell March 27, 2014, 11:35 a.m. UTC | #1
On Wed, 2014-03-26 at 21:18 +0000, Zoltan Kiss wrote:
> An old inefficiency of the TX path that we are grant mapping the first slot,
> and then copy the header part to the linear area. Instead, doing a grant copy
> for that header straight on is more reasonable. Especially because there are
> ongoing efforts to make Xen avoiding TLB flush after unmap when the page were
> not touched in Dom0. In the original way the memcpy ruined that.
> The key changes:
> - the vif has a tx_copy_ops array again
> - xenvif_tx_build_gops sets up the grant copy operations
> - we don't have to figure out whether the header and first frag are on the same
>   grant mapped page or not
> 
> Unrelated, but I've made a small refactoring in xenvif_get_extras as well.

Just a few thoughts, not really based on close review of the code.

Do you have any measurements for this series or workloads where it is
particularly beneficial?

You've added a second hypercall to tx_action, probably those can be
combined into one vm exit by using a multicall. Also you should omit
either if their respective nr_?ops is zero (can nr_cops ever be zero?)

Adding another MAX_PENDING_REQS sized array is unfortunate. Could we
perhaps get away with only ever doing copy for the first N requests (for
small N) in a frame and copying up the request, N should be chosen to be
large enough to cover the more common cases (which I suppose is Windows
which puts the network and transport headers in separate slots). This
might allow the copy array to be smaller, at the expense of still doing
the map+memcpy for some corner cases.

Or (and I confess this is a skanky hack): we overlay tx_copy_ops and
tx_map_ops in a union, and insert one set of ops from the front and the
other from the end, taking great care around when and where they meet.


>  static int xenvif_tx_check_gop(struct xenvif *vif,
>  			       struct sk_buff *skb,
> -			       struct gnttab_map_grant_ref **gopp)
> +			       struct gnttab_map_grant_ref **gopp,
> +			       struct gnttab_copy **gopp_copy)

I think a precursor patch which only does s/gopp/gopp_map/ would be
beneficial.
> @@ -1164,7 +1147,9 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
>  	return false;
>  }
>  
> -static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
> +static unsigned xenvif_tx_build_gops(struct xenvif *vif,
> +				     int budget,
> +				     unsigned *copy_ops)

I think you should turn the nr map ops into an explicit pointer return
too, having one thing go via the formal return code and another via a
pointer is a bit odd.

>  	struct gnttab_map_grant_ref *gop = vif->tx_map_ops, *request_gop;
>  	struct sk_buff *skb;
> @@ -1267,24 +1252,37 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
>  			}
>  		}
>  
> -		xenvif_tx_create_gop(vif, pending_idx, &txreq, gop);
> -
> -		gop++;
> -
>  		XENVIF_TX_CB(skb)->pending_idx = pending_idx;
>  
>  		__skb_put(skb, data_len);

Can't you allocate the skb with sufficient headroom? (or maybe I've
forgotten again how skb payload management works and __skb_put is
effectively free on an empty skb?)

> +		vif->tx_copy_ops[*copy_ops].source.u.ref = txreq.gref;
> +		vif->tx_copy_ops[*copy_ops].source.domid = vif->domid;
> +		vif->tx_copy_ops[*copy_ops].source.offset = txreq.offset;
> +
> +		vif->tx_copy_ops[*copy_ops].dest.u.gmfn = virt_to_mfn(skb->data);
> +		vif->tx_copy_ops[*copy_ops].dest.domid = DOMID_SELF;
> +		vif->tx_copy_ops[*copy_ops].dest.offset = offset_in_page(skb->data);
> +
> +		vif->tx_copy_ops[*copy_ops].len = data_len;

Do we want to copy the entire first frag even if it is e.g. a complete
page?

I'm not sure where the tradeoff lies between doing a grant copy of more
than necessary and doing a map+memcpy when the map is already required
for the page frag anyway.

What about the case where the first frag is less than PKT_PROT_LEN? I
think you still do map+memcpy in that case?

> @@ -1375,6 +1374,7 @@ static int xenvif_handle_frag_list(struct xenvif *vif, struct sk_buff *skb)
>  static int xenvif_tx_submit(struct xenvif *vif)
>  {
>  	struct gnttab_map_grant_ref *gop = vif->tx_map_ops;

Another candidate for a precursor patch renaming for clarity.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zoltan Kiss March 27, 2014, 12:46 p.m. UTC | #2
On 27/03/14 11:35, Ian Campbell wrote:
> On Wed, 2014-03-26 at 21:18 +0000, Zoltan Kiss wrote:
>> An old inefficiency of the TX path that we are grant mapping the first slot,
>> and then copy the header part to the linear area. Instead, doing a grant copy
>> for that header straight on is more reasonable. Especially because there are
>> ongoing efforts to make Xen avoiding TLB flush after unmap when the page were
>> not touched in Dom0. In the original way the memcpy ruined that.
>> The key changes:
>> - the vif has a tx_copy_ops array again
>> - xenvif_tx_build_gops sets up the grant copy operations
>> - we don't have to figure out whether the header and first frag are on the same
>>    grant mapped page or not
>>
>> Unrelated, but I've made a small refactoring in xenvif_get_extras as well.
>
> Just a few thoughts, not really based on close review of the code.
>
> Do you have any measurements for this series or workloads where it is
> particularly beneficial?
My manual measurements showed that if I also use the Xen patch which 
avoids TLB flush when the pages were not touched, I can easily get 9 - 
9.3 Gbps with iperf. But I'll run some more automated tests.

>
> You've added a second hypercall to tx_action, probably those can be
> combined into one vm exit by using a multicall.
Yes, good idea!

> Also you should omit
> either if their respective nr_?ops is zero (can nr_cops ever be zero?)
If nr_cops is zero, nr_gops should be too, but it can happen that we 
have only copy ops, but no grant ops (all the packets processed are < 
PKT_PROT_LEN).
Actually there is a bug in tx_action, it shouldn't return when (nr_cops 
!= 0) && (nr_gops == 0)

>
> Adding another MAX_PENDING_REQS sized array is unfortunate. Could we
> perhaps get away with only ever doing copy for the first N requests (for
> small N) in a frame and copying up the request, N should be chosen to be
> large enough to cover the more common cases (which I suppose is Windows
> which puts the network and transport headers in separate slots). This
> might allow the copy array to be smaller, at the expense of still doing
> the map+memcpy for some corner cases.
N is already 1, we already copy only the first slot. But here we should 
be prepared for the unlikely situation that a guest sends 256 packets 
suddenly, all with one slot.

>
> Or (and I confess this is a skanky hack): we overlay tx_copy_ops and
> tx_map_ops in a union, and insert one set of ops from the front and the
> other from the end, taking great care around when and where they meet.
Unfortunately the hypercalls expect an array of _one_ of them, so we 
can't put the 2 types into an union unless it's guaranteed that they 
have the same size. And they are expected to be

>
>
>>   static int xenvif_tx_check_gop(struct xenvif *vif,
>>   			       struct sk_buff *skb,
>> -			       struct gnttab_map_grant_ref **gopp)
>> +			       struct gnttab_map_grant_ref **gopp,
>> +			       struct gnttab_copy **gopp_copy)
>
> I think a precursor patch which only does s/gopp/gopp_map/ would be
> beneficial.
It's 5 renaming in 4 hunks, I think it's enough to do them in the same patch

>> @@ -1164,7 +1147,9 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
>>   	return false;
>>   }
>>
>> -static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
>> +static unsigned xenvif_tx_build_gops(struct xenvif *vif,
>> +				     int budget,
>> +				     unsigned *copy_ops)
>
> I think you should turn the nr map ops into an explicit pointer return
> too, having one thing go via the formal return code and another via a
> pointer is a bit odd.
Ok


>
>>   	struct gnttab_map_grant_ref *gop = vif->tx_map_ops, *request_gop;
>>   	struct sk_buff *skb;
>> @@ -1267,24 +1252,37 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
>>   			}
>>   		}
>>
>> -		xenvif_tx_create_gop(vif, pending_idx, &txreq, gop);
>> -
>> -		gop++;
>> -
>>   		XENVIF_TX_CB(skb)->pending_idx = pending_idx;
>>
>>   		__skb_put(skb, data_len);
>
> Can't you allocate the skb with sufficient headroom? (or maybe I've
> forgotten again how skb payload management works and __skb_put is
> effectively free on an empty skb?)
It effectively does:

	skb->tail += len;
	skb->len  += len;

So we can copy data_len between head and tail. The skb is already 
allocated in xenvif_alloc_skb with a buffer size NET_SKB_PAD + 
NET_IP_ALIGN + data_len, and and the headroom is NET_SKB_PAD + NET_IP_ALIGN.


>
>> +		vif->tx_copy_ops[*copy_ops].source.u.ref = txreq.gref;
>> +		vif->tx_copy_ops[*copy_ops].source.domid = vif->domid;
>> +		vif->tx_copy_ops[*copy_ops].source.offset = txreq.offset;
>> +
>> +		vif->tx_copy_ops[*copy_ops].dest.u.gmfn = virt_to_mfn(skb->data);
>> +		vif->tx_copy_ops[*copy_ops].dest.domid = DOMID_SELF;
>> +		vif->tx_copy_ops[*copy_ops].dest.offset = offset_in_page(skb->data);
>> +
>> +		vif->tx_copy_ops[*copy_ops].len = data_len;
>
> Do we want to copy the entire first frag even if it is e.g. a complete
> page?
>
> I'm not sure where the tradeoff lies between doing a grant copy of more
> than necessary and doing a map+memcpy when the map is already required
> for the page frag anyway.
I think we should only grant copy the parts which goes to the linear 
part, because we would memcpy it anyway. It's expected that the stack 
won't touch anything else while the packet passes through. data_len is 
the size of the linear buffer here.
>
> What about the case where the first frag is less than PKT_PROT_LEN? I
> think you still do map+memcpy in that case?
Then data_len will be txreq.size, we allocate the skb for that, and 
later on we call __pskb_pull_tail in xenvif_tx_submit to pull up for 
that (which is essentially map+memcpy). It's not optimal, but it's like 
that since the good old days. I agree it could be optimized, but let's 
change it in a different patch!

>
>> @@ -1375,6 +1374,7 @@ static int xenvif_handle_frag_list(struct xenvif *vif, struct sk_buff *skb)
>>   static int xenvif_tx_submit(struct xenvif *vif)
>>   {
>>   	struct gnttab_map_grant_ref *gop = vif->tx_map_ops;
>
> Another candidate for a precursor patch renaming for clarity.
>
> Ian.
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell March 27, 2014, 12:54 p.m. UTC | #3
On Thu, 2014-03-27 at 12:46 +0000, Zoltan Kiss wrote:
> On 27/03/14 11:35, Ian Campbell wrote:
> > On Wed, 2014-03-26 at 21:18 +0000, Zoltan Kiss wrote:
> >> An old inefficiency of the TX path that we are grant mapping the first slot,
> >> and then copy the header part to the linear area. Instead, doing a grant copy
> >> for that header straight on is more reasonable. Especially because there are
> >> ongoing efforts to make Xen avoiding TLB flush after unmap when the page were
> >> not touched in Dom0. In the original way the memcpy ruined that.
> >> The key changes:
> >> - the vif has a tx_copy_ops array again
> >> - xenvif_tx_build_gops sets up the grant copy operations
> >> - we don't have to figure out whether the header and first frag are on the same
> >>    grant mapped page or not
> >>
> >> Unrelated, but I've made a small refactoring in xenvif_get_extras as well.
> >
> > Just a few thoughts, not really based on close review of the code.
> >
> > Do you have any measurements for this series or workloads where it is
> > particularly beneficial?
> My manual measurements showed that if I also use the Xen patch which 
> avoids TLB flush when the pages were not touched, I can easily get 9 - 
> 9.3 Gbps with iperf. But I'll run some more automated tests.

I suppose these numbers are related because avoiding the memcpy avoids
touching the page?

> >
> > You've added a second hypercall to tx_action, probably those can be
> > combined into one vm exit by using a multicall.
> Yes, good idea!
> 
> > Also you should omit
> > either if their respective nr_?ops is zero (can nr_cops ever be zero?)
> If nr_cops is zero, nr_gops should be too, but it can happen that we 
> have only copy ops, but no grant ops (all the packets processed are < 
> PKT_PROT_LEN).

That sounds like what I would expect, good.

> Actually there is a bug in tx_action, it shouldn't return when (nr_cops 
> != 0) && (nr_gops == 0)
> 
> >
> > Adding another MAX_PENDING_REQS sized array is unfortunate. Could we
> > perhaps get away with only ever doing copy for the first N requests (for
> > small N) in a frame and copying up the request, N should be chosen to be
> > large enough to cover the more common cases (which I suppose is Windows
> > which puts the network and transport headers in separate slots). This
> > might allow the copy array to be smaller, at the expense of still doing
> > the map+memcpy for some corner cases.
> N is already 1, we already copy only the first slot. But here we should 
> be prepared for the unlikely situation that a guest sends 256 packets 
> suddenly, all with one slot.

Right, I was wondering if we could just do smaller batches in that case,
but I expect that would be intolerably sucky.

> >
> > Or (and I confess this is a skanky hack): we overlay tx_copy_ops and
> > tx_map_ops in a union, and insert one set of ops from the front and the
> > other from the end, taking great care around when and where they meet.
> Unfortunately the hypercalls expect an array of _one_ of them, so we 
> can't put the 2 types into an union unless it's guaranteed that they 
> have the same size. And they are expected to be

(was there more of this last sentence?)

I was thinking 
	union {
		struct gnt_map maps[NR_...];
		struct gnt_copy copy[NR_...];
	}

Then you fill copy from 0..N and maps from M..NR_ and maintain the
invariant that
         (&maps[M] - &map[s0]) > &copy[N]

and pass to the grant ops (&copy[0], N) and (&maps[M], NR_... - M)
respectively.

Too tricksy perhaps?

> 
> >
> >
> >>   static int xenvif_tx_check_gop(struct xenvif *vif,
> >>   			       struct sk_buff *skb,
> >> -			       struct gnttab_map_grant_ref **gopp)
> >> +			       struct gnttab_map_grant_ref **gopp,
> >> +			       struct gnttab_copy **gopp_copy)
> >
> > I think a precursor patch which only does s/gopp/gopp_map/ would be
> > beneficial.
> It's 5 renaming in 4 hunks, I think it's enough to do them in the same patch

If you think it won't impact the readability of the meat of the patch
then go for it, but it seems trivial to do it first and avoid the risk
that I might think otherwise.


> >
> >>   	struct gnttab_map_grant_ref *gop = vif->tx_map_ops, *request_gop;
> >>   	struct sk_buff *skb;
> >> @@ -1267,24 +1252,37 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
> >>   			}
> >>   		}
> >>
> >> -		xenvif_tx_create_gop(vif, pending_idx, &txreq, gop);
> >> -
> >> -		gop++;
> >> -
> >>   		XENVIF_TX_CB(skb)->pending_idx = pending_idx;
> >>
> >>   		__skb_put(skb, data_len);
> >
> > Can't you allocate the skb with sufficient headroom? (or maybe I've
> > forgotten again how skb payload management works and __skb_put is
> > effectively free on an empty skb?)
> It effectively does:
> 
> 	skb->tail += len;
> 	skb->len  += len;
> 
> So we can copy data_len between head and tail. The skb is already 
> allocated in xenvif_alloc_skb with a buffer size NET_SKB_PAD + 
> NET_IP_ALIGN + data_len, and and the headroom is NET_SKB_PAD + NET_IP_ALIGN.
> 

Great.

> >
> >> +		vif->tx_copy_ops[*copy_ops].source.u.ref = txreq.gref;
> >> +		vif->tx_copy_ops[*copy_ops].source.domid = vif->domid;
> >> +		vif->tx_copy_ops[*copy_ops].source.offset = txreq.offset;
> >> +
> >> +		vif->tx_copy_ops[*copy_ops].dest.u.gmfn = virt_to_mfn(skb->data);
> >> +		vif->tx_copy_ops[*copy_ops].dest.domid = DOMID_SELF;
> >> +		vif->tx_copy_ops[*copy_ops].dest.offset = offset_in_page(skb->data);
> >> +
> >> +		vif->tx_copy_ops[*copy_ops].len = data_len;
> >
> > Do we want to copy the entire first frag even if it is e.g. a complete
> > page?
> >
> > I'm not sure where the tradeoff lies between doing a grant copy of more
> > than necessary and doing a map+memcpy when the map is already required
> > for the page frag anyway.
> I think we should only grant copy the parts which goes to the linear 
> part, because we would memcpy it anyway. It's expected that the stack 
> won't touch anything else while the packet passes through. data_len is 
> the size of the linear buffer here.
> >
> > What about the case where the first frag is less than PKT_PROT_LEN? I
> > think you still do map+memcpy in that case?
> Then data_len will be txreq.size, we allocate the skb for that, and 
> later on we call __pskb_pull_tail in xenvif_tx_submit to pull up for 
> that (which is essentially map+memcpy). It's not optimal, but it's like 
> that since the good old days. I agree it could be optimized, but let's 
> change it in a different patch!

OK. Can you clarify the title and/or the commit log to make it obvious
that we only copy (some portion of) the first frag and that we still map
+memcpy the remainder of PKT_PROT_LEN if the first frag is not large
enough.

Thanks,
Ian.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zoltan Kiss March 27, 2014, 2:26 p.m. UTC | #4
On 27/03/14 12:54, Ian Campbell wrote:
> On Thu, 2014-03-27 at 12:46 +0000, Zoltan Kiss wrote:
>> On 27/03/14 11:35, Ian Campbell wrote:
>>> On Wed, 2014-03-26 at 21:18 +0000, Zoltan Kiss wrote:
>>>> An old inefficiency of the TX path that we are grant mapping the first slot,
>>>> and then copy the header part to the linear area. Instead, doing a grant copy
>>>> for that header straight on is more reasonable. Especially because there are
>>>> ongoing efforts to make Xen avoiding TLB flush after unmap when the page were
>>>> not touched in Dom0. In the original way the memcpy ruined that.
>>>> The key changes:
>>>> - the vif has a tx_copy_ops array again
>>>> - xenvif_tx_build_gops sets up the grant copy operations
>>>> - we don't have to figure out whether the header and first frag are on the same
>>>>     grant mapped page or not
>>>>
>>>> Unrelated, but I've made a small refactoring in xenvif_get_extras as well.
>>>
>>> Just a few thoughts, not really based on close review of the code.
>>>
>>> Do you have any measurements for this series or workloads where it is
>>> particularly beneficial?
>> My manual measurements showed that if I also use the Xen patch which
>> avoids TLB flush when the pages were not touched, I can easily get 9 -
>> 9.3 Gbps with iperf. But I'll run some more automated tests.
>
> I suppose these numbers are related because avoiding the memcpy avoids
> touching the page?
Yes

>>> Or (and I confess this is a skanky hack): we overlay tx_copy_ops and
>>> tx_map_ops in a union, and insert one set of ops from the front and the
>>> other from the end, taking great care around when and where they meet.
>> Unfortunately the hypercalls expect an array of _one_ of them, so we
>> can't put the 2 types into an union unless it's guaranteed that they
>> have the same size. And they are expected to be
>
> (was there more of this last sentence?)
>
> I was thinking
> 	union {
> 		struct gnt_map maps[NR_...];
> 		struct gnt_copy copy[NR_...];
> 	}
>
> Then you fill copy from 0..N and maps from M..NR_ and maintain the
> invariant that
>           (&maps[M] - &map[s0]) > &copy[N]
>
> and pass to the grant ops (&copy[0], N) and (&maps[M], NR_... - M)
> respectively.
>
> Too tricksy perhaps?

Yeah, this array is 30*256 bytes (7.5 kb) long, per vif, I don't think 
it worth the fuss.

>>>
>>>> +		vif->tx_copy_ops[*copy_ops].source.u.ref = txreq.gref;
>>>> +		vif->tx_copy_ops[*copy_ops].source.domid = vif->domid;
>>>> +		vif->tx_copy_ops[*copy_ops].source.offset = txreq.offset;
>>>> +
>>>> +		vif->tx_copy_ops[*copy_ops].dest.u.gmfn = virt_to_mfn(skb->data);
>>>> +		vif->tx_copy_ops[*copy_ops].dest.domid = DOMID_SELF;
>>>> +		vif->tx_copy_ops[*copy_ops].dest.offset = offset_in_page(skb->data);
>>>> +
>>>> +		vif->tx_copy_ops[*copy_ops].len = data_len;
>>>
>>> Do we want to copy the entire first frag even if it is e.g. a complete
>>> page?
>>>
>>> I'm not sure where the tradeoff lies between doing a grant copy of more
>>> than necessary and doing a map+memcpy when the map is already required
>>> for the page frag anyway.
>> I think we should only grant copy the parts which goes to the linear
>> part, because we would memcpy it anyway. It's expected that the stack
>> won't touch anything else while the packet passes through. data_len is
>> the size of the linear buffer here.
>>>
>>> What about the case where the first frag is less than PKT_PROT_LEN? I
>>> think you still do map+memcpy in that case?
>> Then data_len will be txreq.size, we allocate the skb for that, and
>> later on we call __pskb_pull_tail in xenvif_tx_submit to pull up for
>> that (which is essentially map+memcpy). It's not optimal, but it's like
>> that since the good old days. I agree it could be optimized, but let's
>> change it in a different patch!
>
> OK. Can you clarify the title and/or the commit log to make it obvious
> that we only copy (some portion of) the first frag and that we still map
> +memcpy the remainder of PKT_PROT_LEN if the first frag is not large
> enough.
Ok

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zoltan Kiss March 27, 2014, 3:16 p.m. UTC | #5
On 27/03/14 12:46, Zoltan Kiss wrote:
> On 27/03/14 11:35, Ian Campbell wrote:
>> On Wed, 2014-03-26 at 21:18 +0000, Zoltan Kiss wrote:
>> You've added a second hypercall to tx_action, probably those can be
>> combined into one vm exit by using a multicall.
> Yes, good idea!

Thinking a bit further, it would need some grant-table.c changes first. 
I've put it onto my todo list.

Zoli
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 89b2d42..c995532 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -119,6 +119,7 @@  struct xenvif {
 	struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
 	grant_handle_t grant_tx_handle[MAX_PENDING_REQS];
 
+	struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
 	struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
 	struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
 	/* passed to gnttab_[un]map_refs with pages under (un)mapping */
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 0efa32d..2d33078 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -915,35 +915,30 @@  static inline void xenvif_grant_handle_reset(struct xenvif *vif,
 
 static int xenvif_tx_check_gop(struct xenvif *vif,
 			       struct sk_buff *skb,
-			       struct gnttab_map_grant_ref **gopp)
+			       struct gnttab_map_grant_ref **gopp,
+			       struct gnttab_copy **gopp_copy)
 {
 	struct gnttab_map_grant_ref *gop = *gopp;
 	u16 pending_idx = XENVIF_TX_CB(skb)->pending_idx;
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
-	struct pending_tx_info *tx_info;
 	int nr_frags = shinfo->nr_frags;
-	int i, err, start;
+	int i, err;
 	struct sk_buff *first_skb = NULL;
 
 	/* Check status of header. */
-	err = gop->status;
+	err = (*gopp_copy)->status;
+	(*gopp_copy)++;
 	if (unlikely(err))
 		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR);
-	else
-		xenvif_grant_handle_set(vif, pending_idx , gop->handle);
-
-	/* Skip first skb fragment if it is on same page as header fragment. */
-	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
 
 check_frags:
-	for (i = start; i < nr_frags; i++) {
+	for (i = 0; i < nr_frags; i++, gop++) {
 		int j, newerr;
 
 		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
-		tx_info = &vif->pending_tx_info[pending_idx];
 
 		/* Check error status: if okay then remember grant handle. */
-		newerr = (++gop)->status;
+		newerr = (gop)->status;
 
 		if (likely(!newerr)) {
 			xenvif_grant_handle_set(vif, pending_idx , gop->handle);
@@ -959,13 +954,8 @@  check_frags:
 		/* Not the first error? Preceding frags already invalidated. */
 		if (err)
 			continue;
-		/* First error: invalidate header and preceding fragments. */
-		if (!first_skb)
-			pending_idx = XENVIF_TX_CB(skb)->pending_idx;
-		else
-			pending_idx = XENVIF_TX_CB(skb)->pending_idx;
-		xenvif_idx_unmap(vif, pending_idx);
-		for (j = start; j < i; j++) {
+		/* First error: invalidate preceding fragments. */
+		for (j = 0; j < i; j++) {
 			pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
 			xenvif_idx_unmap(vif, pending_idx);
 		}
@@ -979,7 +969,6 @@  check_frags:
 		skb = shinfo->frag_list;
 		shinfo = skb_shinfo(skb);
 		nr_frags = shinfo->nr_frags;
-		start = 0;
 
 		goto check_frags;
 	}
@@ -990,15 +979,13 @@  check_frags:
 	if (first_skb && err) {
 		int j;
 		shinfo = skb_shinfo(first_skb);
-		pending_idx = XENVIF_TX_CB(skb)->pending_idx;
-		start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
-		for (j = start; j < shinfo->nr_frags; j++) {
+		for (j = 0; j < shinfo->nr_frags; j++) {
 			pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
 			xenvif_idx_unmap(vif, pending_idx);
 		}
 	}
 
-	*gopp = gop + 1;
+	*gopp = gop;
 	return err;
 }
 
@@ -1009,9 +996,6 @@  static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
 	int i;
 	u16 prev_pending_idx = INVALID_PENDING_IDX;
 
-	if (skb_shinfo(skb)->destructor_arg)
-		prev_pending_idx = XENVIF_TX_CB(skb)->pending_idx;
-
 	for (i = 0; i < nr_frags; i++) {
 		skb_frag_t *frag = shinfo->frags + i;
 		struct xen_netif_tx_request *txp;
@@ -1021,10 +1005,10 @@  static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
 		pending_idx = frag_get_pending_idx(frag);
 
 		/* If this is not the first frag, chain it to the previous*/
-		if (unlikely(prev_pending_idx == INVALID_PENDING_IDX))
+		if (prev_pending_idx == INVALID_PENDING_IDX)
 			skb_shinfo(skb)->destructor_arg =
 				&callback_param(vif, pending_idx);
-		else if (likely(pending_idx != prev_pending_idx))
+		else
 			callback_param(vif, prev_pending_idx).ctx =
 				&callback_param(vif, pending_idx);
 
@@ -1065,9 +1049,9 @@  static int xenvif_get_extras(struct xenvif *vif,
 
 		memcpy(&extra, RING_GET_REQUEST(&vif->tx, cons),
 		       sizeof(extra));
+		vif->tx.req_cons = ++cons;
 		if (unlikely(!extra.type ||
 			     extra.type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
-			vif->tx.req_cons = ++cons;
 			netdev_err(vif->dev,
 				   "Invalid extra type: %d\n", extra.type);
 			xenvif_fatal_tx_err(vif);
@@ -1075,7 +1059,6 @@  static int xenvif_get_extras(struct xenvif *vif,
 		}
 
 		memcpy(&extras[extra.type - 1], &extra, sizeof(extra));
-		vif->tx.req_cons = ++cons;
 	} while (extra.flags & XEN_NETIF_EXTRA_FLAG_MORE);
 
 	return work_to_do;
@@ -1164,7 +1147,9 @@  static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 	return false;
 }
 
-static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
+static unsigned xenvif_tx_build_gops(struct xenvif *vif,
+				     int budget,
+				     unsigned *copy_ops)
 {
 	struct gnttab_map_grant_ref *gop = vif->tx_map_ops, *request_gop;
 	struct sk_buff *skb;
@@ -1267,24 +1252,37 @@  static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 			}
 		}
 
-		xenvif_tx_create_gop(vif, pending_idx, &txreq, gop);
-
-		gop++;
-
 		XENVIF_TX_CB(skb)->pending_idx = pending_idx;
 
 		__skb_put(skb, data_len);
+		vif->tx_copy_ops[*copy_ops].source.u.ref = txreq.gref;
+		vif->tx_copy_ops[*copy_ops].source.domid = vif->domid;
+		vif->tx_copy_ops[*copy_ops].source.offset = txreq.offset;
+
+		vif->tx_copy_ops[*copy_ops].dest.u.gmfn = virt_to_mfn(skb->data);
+		vif->tx_copy_ops[*copy_ops].dest.domid = DOMID_SELF;
+		vif->tx_copy_ops[*copy_ops].dest.offset = offset_in_page(skb->data);
+
+		vif->tx_copy_ops[*copy_ops].len = data_len;
+		vif->tx_copy_ops[*copy_ops].flags = GNTCOPY_source_gref;
+
+		(*copy_ops)++;
 
 		skb_shinfo(skb)->nr_frags = ret;
 		if (data_len < txreq.size) {
 			skb_shinfo(skb)->nr_frags++;
 			frag_set_pending_idx(&skb_shinfo(skb)->frags[0],
 					     pending_idx);
+			xenvif_tx_create_gop(vif, pending_idx, &txreq, gop);
+			gop++;
 		} else {
 			frag_set_pending_idx(&skb_shinfo(skb)->frags[0],
 					     INVALID_PENDING_IDX);
+			memcpy(&vif->pending_tx_info[pending_idx].req, &txreq,
+			       sizeof(txreq));
 		}
 
+
 		vif->pending_cons++;
 
 		request_gop = xenvif_get_requests(vif, skb, txfrags, gop);
@@ -1299,7 +1297,8 @@  static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 
 		vif->tx.req_cons = idx;
 
-		if ((gop-vif->tx_map_ops) >= ARRAY_SIZE(vif->tx_map_ops))
+		if (((gop-vif->tx_map_ops) >= ARRAY_SIZE(vif->tx_map_ops)) ||
+		    (*copy_ops >= ARRAY_SIZE(vif->tx_map_ops)))
 			break;
 	}
 
@@ -1375,6 +1374,7 @@  static int xenvif_handle_frag_list(struct xenvif *vif, struct sk_buff *skb)
 static int xenvif_tx_submit(struct xenvif *vif)
 {
 	struct gnttab_map_grant_ref *gop = vif->tx_map_ops;
+	struct gnttab_copy *gop_copy = vif->tx_copy_ops;
 	struct sk_buff *skb;
 	int work_done = 0;
 
@@ -1387,7 +1387,7 @@  static int xenvif_tx_submit(struct xenvif *vif)
 		txp = &vif->pending_tx_info[pending_idx].req;
 
 		/* Check the remap error code. */
-		if (unlikely(xenvif_tx_check_gop(vif, skb, &gop))) {
+		if (unlikely(xenvif_tx_check_gop(vif, skb, &gop, &gop_copy))) {
 			netdev_dbg(vif->dev, "netback grant failed.\n");
 			skb_shinfo(skb)->nr_frags = 0;
 			kfree_skb(skb);
@@ -1395,19 +1395,15 @@  static int xenvif_tx_submit(struct xenvif *vif)
 		}
 
 		data_len = skb->len;
-		memcpy(skb->data,
-		       (void *)(idx_to_kaddr(vif, pending_idx)|txp->offset),
-		       data_len);
 		callback_param(vif, pending_idx).ctx = NULL;
 		if (data_len < txp->size) {
 			/* Append the packet payload as a fragment. */
 			txp->offset += data_len;
 			txp->size -= data_len;
-			skb_shinfo(skb)->destructor_arg =
-				&callback_param(vif, pending_idx);
 		} else {
 			/* Schedule a response immediately. */
-			xenvif_idx_unmap(vif, pending_idx);
+			xenvif_idx_release(vif, pending_idx,
+					   XEN_NETIF_RSP_OKAY);
 		}
 
 		if (txp->flags & XEN_NETTXF_csum_blank)
@@ -1586,17 +1582,19 @@  static inline void xenvif_tx_dealloc_action(struct xenvif *vif)
 /* Called after netfront has transmitted */
 int xenvif_tx_action(struct xenvif *vif, int budget)
 {
-	unsigned nr_gops;
+	unsigned nr_gops, nr_cops = 0;
 	int work_done, ret;
 
 	if (unlikely(!tx_work_todo(vif)))
 		return 0;
 
-	nr_gops = xenvif_tx_build_gops(vif, budget);
+	nr_gops = xenvif_tx_build_gops(vif, budget, &nr_cops);
 
-	if (nr_gops == 0)
+	if ((nr_cops == 0) && (nr_gops == 0))
 		return 0;
 
+	gnttab_batch_copy(vif->tx_copy_ops, nr_cops);
+
 	ret = gnttab_map_refs(vif->tx_map_ops,
 			      NULL,
 			      vif->pages_to_map,