diff mbox

[04/11] IB/cm: Expose DGID in SIDR request events

Message ID 1434358036-15526-5-git-send-email-haggaie@mellanox.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Haggai Eran June 15, 2015, 8:47 a.m. UTC
The destination GID can be used to uniquely resolve the request. Expose the
GID in SIDR request events when it is available, so that the rdma_cm module
can use that information.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
---
 drivers/infiniband/core/cm.c | 7 +++++++
 include/rdma/ib_cm.h         | 2 ++
 2 files changed, 9 insertions(+)

Comments

Sean Hefty June 15, 2015, 9:32 p.m. UTC | #1
>  drivers/infiniband/core/cm.c | 7 +++++++
>  include/rdma/ib_cm.h         | 2 ++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index c5f5f89e274a..46f99ec4080a 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -2983,6 +2983,13 @@ static void cm_format_sidr_req_event(struct cm_work
> *work,
>  	param->pkey = __be16_to_cpu(sidr_req_msg->pkey);
>  	param->listen_id = listen_id;
>  	param->service_id = sidr_req_msg->service_id;
> +	if (work->mad_recv_wc->wc->wc_flags & IB_WC_GRH) {
> +		param->grh = 1;
> +		memcpy(&param->dgid, &work->mad_recv_wc->recv_buf.grh->dgid,
> +		       sizeof(param->dgid));
> +	} else {
> +		param->grh = 0;

What is the use case here?  Are you trying to sort by device?  How does the GID of the GMP relate to the listen?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe June 15, 2015, 10:08 p.m. UTC | #2
On Mon, Jun 15, 2015 at 09:32:53PM +0000, Hefty, Sean wrote:
> >  drivers/infiniband/core/cm.c | 7 +++++++
> >  include/rdma/ib_cm.h         | 2 ++
> >  2 files changed, 9 insertions(+)
> > 
> > diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> > index c5f5f89e274a..46f99ec4080a 100644
> > +++ b/drivers/infiniband/core/cm.c
> > @@ -2983,6 +2983,13 @@ static void cm_format_sidr_req_event(struct cm_work
> > *work,
> >  	param->pkey = __be16_to_cpu(sidr_req_msg->pkey);
> >  	param->listen_id = listen_id;
> >  	param->service_id = sidr_req_msg->service_id;
> > +	if (work->mad_recv_wc->wc->wc_flags & IB_WC_GRH) {
> > +		param->grh = 1;
> > +		memcpy(&param->dgid, &work->mad_recv_wc->recv_buf.grh->dgid,
> > +		       sizeof(param->dgid));
> > +	} else {
> > +		param->grh = 0;
> 
> What is the use case here?  Are you trying to sort by device?  How does the GID of the GMP relate to the listen?

Ouch, that is getting fugly, it used by cma_save_req_info, which feeds
the data into ib_get_net_dev_by_params - basically it chooses the
alias GUID'd netdev to use.

But how is that going to work? How is the sender to know it should be
sending a GRH with the CM message?

Falling back to use the primary_path sgid seems like a poor
substitute, if APM is being used that might be a totally different
port than the CM message.

I'm also not sure about the pkey, it seems to me that the pkey used to
select the ingress netdevice should be the pkey of the rx'd CM GMP,
not the pkey of the future RDMA channel, so this looks like it should
change:

+static int cma_save_req_info(const struct ib_cm_event *ib_event,
+			     struct cma_req_info *req)
[..]
+	switch (ib_event->event) {
+	case IB_CM_REQ_RECEIVED:
+		req->pkey	= be16_to_cpu(req_param->primary_path->pkey);
[..]
+	case IB_CM_SIDR_REQ_RECEIVED:
+		req->pkey	= sidr_param->pkey;

Some comment for the GID, if the GRH is present, then the DGID from
there should alwas be used, not the content of the REQ.

All this is because the CM IP protocol didn't include the LLADDR of
the target's IPoIB interface.. If we are already looking at a breaking
change to force GRH, how hard would it be to add on the LLADDR
somehow instead?

Jason
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Haggai Eran June 16, 2015, 6:51 a.m. UTC | #3
On 16/06/2015 00:32, Hefty, Sean wrote:
>>  drivers/infiniband/core/cm.c | 7 +++++++
>>  include/rdma/ib_cm.h         | 2 ++
>>  2 files changed, 9 insertions(+)
>>
>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>> index c5f5f89e274a..46f99ec4080a 100644
>> --- a/drivers/infiniband/core/cm.c
>> +++ b/drivers/infiniband/core/cm.c
>> @@ -2983,6 +2983,13 @@ static void cm_format_sidr_req_event(struct cm_work
>> *work,
>>  	param->pkey = __be16_to_cpu(sidr_req_msg->pkey);
>>  	param->listen_id = listen_id;
>>  	param->service_id = sidr_req_msg->service_id;
>> +	if (work->mad_recv_wc->wc->wc_flags & IB_WC_GRH) {
>> +		param->grh = 1;
>> +		memcpy(&param->dgid, &work->mad_recv_wc->recv_buf.grh->dgid,
>> +		       sizeof(param->dgid));
>> +	} else {
>> +		param->grh = 0;
> 
> What is the use case here?  Are you trying to sort by device?  How does the GID of the GMP relate to the listen?

The idea is to allow SIDR request to be sorted by the GID, when we will
have alias GIDs for IPoIB.

Unlike the CM requests, SIDR requests do not contain the remote GID, so
I thought we could use the GID from the GRH and turn on GRH on such systems.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Haggai Eran June 16, 2015, 11:25 a.m. UTC | #4
On 16/06/2015 01:08, Jason Gunthorpe wrote:
> On Mon, Jun 15, 2015 at 09:32:53PM +0000, Hefty, Sean wrote:
>>>  drivers/infiniband/core/cm.c | 7 +++++++
>>>  include/rdma/ib_cm.h         | 2 ++
>>>  2 files changed, 9 insertions(+)
>>>
>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>> index c5f5f89e274a..46f99ec4080a 100644
>>> +++ b/drivers/infiniband/core/cm.c
>>> @@ -2983,6 +2983,13 @@ static void cm_format_sidr_req_event(struct cm_work
>>> *work,
>>>  	param->pkey = __be16_to_cpu(sidr_req_msg->pkey);
>>>  	param->listen_id = listen_id;
>>>  	param->service_id = sidr_req_msg->service_id;
>>> +	if (work->mad_recv_wc->wc->wc_flags & IB_WC_GRH) {
>>> +		param->grh = 1;
>>> +		memcpy(&param->dgid, &work->mad_recv_wc->recv_buf.grh->dgid,
>>> +		       sizeof(param->dgid));
>>> +	} else {
>>> +		param->grh = 0;
>>
>> What is the use case here?  Are you trying to sort by device?  How does the GID of the GMP relate to the listen?
> 
> Ouch, that is getting fugly, it used by cma_save_req_info, which feeds
> the data into ib_get_net_dev_by_params - basically it chooses the
> alias GUID'd netdev to use.
> 
> But how is that going to work? How is the sender to know it should be
> sending a GRH with the CM message?

If the admin wants to use SIDR with alias GIDs, they will need to
configure the system to enable GRH for such GMPs. (This series doesn't
include such a patch).

> 
> Falling back to use the primary_path sgid seems like a poor
> substitute, if APM is being used that might be a totally different
> port than the CM message.

Note that the patchset uses primary_path for CM requests always (not as
a fallback), and uses GRH as a fallback for SIDR requests.

Regarding APM, currently the ib_cm code always sends the GMP to the
primary path anyway, right? And in any case, one would expect the
primary path's GID to have a valid net_device and local routing rules,
so I think for the purpose of demuxing and validating the request using
the primary path should be fine.

> 
> I'm also not sure about the pkey, it seems to me that the pkey used to
> select the ingress netdevice should be the pkey of the rx'd CM GMP,
> not the pkey of the future RDMA channel, so this looks like it should
> change:
> 
> +static int cma_save_req_info(const struct ib_cm_event *ib_event,
> +			     struct cma_req_info *req)
> [..]
> +	switch (ib_event->event) {
> +	case IB_CM_REQ_RECEIVED:
> +		req->pkey	= be16_to_cpu(req_param->primary_path->pkey);
> [..]
> +	case IB_CM_SIDR_REQ_RECEIVED:
> +		req->pkey	= sidr_param->pkey;
> 
> Some comment for the GID, if the GRH is present, then the DGID from
> there should alwas be used, not the content of the REQ.

Why do you think the GMP's net_device should be used over the one of the
future RDMA channel?

Thinking about the network namespaces implications, I'm having trouble
thinking of a good use case where a port redirector is in one namespace
and the future RDMA channel is in another namespace.

> 
> All this is because the CM IP protocol didn't include the LLADDR of
> the target's IPoIB interface.. If we are already looking at a breaking
> change to force GRH, how hard would it be to add on the LLADDR
> somehow instead?

So far we can work without GRH for CM requests, and also without GRH for
SIDR requests if we rely on layer 3 for the interface resolution. I'm
not against adding a LLADDR to the protocol somehow, but I don't think
we should abandon all these use cases and the interoperability with
existing software.

Regards,
Haggai
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sean Hefty June 16, 2015, 4:47 p.m. UTC | #5
> The idea is to allow SIDR request to be sorted by the GID, when we will
> have alias GIDs for IPoIB.

Please limit this series, or at least the early patches in this series, to simply moving the de-mux out of the ib_cm and into the rdma_cm.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Haggai Eran June 17, 2015, 10:45 a.m. UTC | #6
On Tuesday, June 16, 2015 7:47 PM, Hefty, Sean <sean.hefty@intel.com> wrote:
> To: Haggai Eran; Doug Ledford
> Cc: linux-rdma@vger.kernel.org; netdev@vger.kernel.org; Liran Liss; Guy Shapiro; Shachar Raindel; Yotam Kenneth; Jason Gunthorpe
> Subject: RE: [PATCH 04/11] IB/cm: Expose DGID in SIDR request events
> 
>> The idea is to allow SIDR request to be sorted by the GID, when we will
>> have alias GIDs for IPoIB.
> 
> Please limit this series, or at least the early patches in this series, to simply moving the de-mux out of the ib_cm and into the rdma_cm.

Fair enough. I'll remove this patch and the related code from this series.

Haggai
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe June 17, 2015, 5:06 p.m. UTC | #7
On Tue, Jun 16, 2015 at 02:25:07PM +0300, Haggai Eran wrote:

> > But how is that going to work? How is the sender to know it should be
> > sending a GRH with the CM message?
> 
> If the admin wants to use SIDR with alias GIDs, they will need to
> configure the system to enable GRH for such GMPs. (This series doesn't
> include such a patch).

Gross.

> Regarding APM, currently the ib_cm code always sends the GMP to the
> primary path anyway, right? And in any case, one would expect the
> primary path's GID to have a valid net_device and local routing rules,
> so I think for the purpose of demuxing and validating the request using
> the primary path should be fine.

The current code works that way, but it is not what I'd expect
generally.

For instance, future APM support will be able to drive dual-rail and
policy will decide which rail is the current best rail for data
transfer. So the GMP may be directed to the IPoIB device with port 1,
but the data transfer may happen on the RDMA port 2. [Note, I already
have very rough patches that do this de-coupling]

> Why do you think the GMP's net_device should be used over the one of the
> future RDMA channel?

The code needs to match the incoming GMP with the logical netdev that
rx's *that GMP*. The fact that goes on to setup an RDMA channel is not
relevant, the nature of the future RDMA channel should not impact how
the GMP is recieved.

> So far we can work without GRH for CM requests, and also without GRH for
> SIDR requests if we rely on layer 3 for the interface resolution. I'm
> not against adding a LLADDR to the protocol somehow, but I don't think
> we should abandon all these use cases and the interoperability with
> existing software.

Well, there is a middle ground. Lets say we get the LLADDR in the GMP
somehow, then we get 100% correct operation when it is present.

For degraded operation we have the (device,port,pkey) and possibly
(device,port,pkey,gid) if there was a GRH. We also have the IP address
hack.

So, I'd say, search in this sequence:
 - If the LLADDR is present, just find the right netdev
 - Otherwise search for (device,port,pkey) / (device,port,pkey,gid)
   If there is only one match then that is unambiguously the correct
   device to use.
 - Repeat the above search, but add the IP address. This is the hack
   we perform for compatibility.

There is no reason to hackily look at the GMP path parameters if we are
relying on #3 above as the hack to save us in the legacy ambiguous case.

.. and to answer your question in the other email, I think we should
keep the hack clearly distinct from the proper operation (in fact,
perhaps it should be user configurable). So #3 should be a distinct
step taken when all else fails, not integrated into earlier steps.

So, this series as it stands just needs to do #2/#3 above and you guys
can figure out the LLADDR business later.

Jason
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Haggai Eran June 18, 2015, 10:41 a.m. UTC | #8
On 17/06/2015 20:06, Jason Gunthorpe wrote:
> On Tue, Jun 16, 2015 at 02:25:07PM +0300, Haggai Eran wrote:

>> Regarding APM, currently the ib_cm code always sends the GMP to the
>> primary path anyway, right? And in any case, one would expect the
>> primary path's GID to have a valid net_device and local routing rules,
>> so I think for the purpose of demuxing and validating the request using
>> the primary path should be fine.
> 
> The current code works that way, but it is not what I'd expect
> generally.
> 
> For instance, future APM support will be able to drive dual-rail and
> policy will decide which rail is the current best rail for data
> transfer. So the GMP may be directed to the IPoIB device with port 1,
> but the data transfer may happen on the RDMA port 2. [Note, I already
> have very rough patches that do this de-coupling]
> 
>> Why do you think the GMP's net_device should be used over the one of the
>> future RDMA channel?
> 
> The code needs to match the incoming GMP with the logical netdev that
> rx's *that GMP*. The fact that goes on to setup an RDMA channel is not
> relevant, the nature of the future RDMA channel should not impact how
> the GMP is recieved.

From what I understand, ib_cm and rdma_cm keeps their own addresses. I
thought that ib_cm's addresses would be used to handle GMPs, and the
rdma_cm addresses (id.route.addr) to represent the created RDMA channel.
After all, that is what ucma_query_addr returns. So are you proposing
that we use the logical netdev that was resolved by the GMP to fill up
the source address returned to user-space? It sounds like it would
prevent the APM usage you described above.

> 
>> So far we can work without GRH for CM requests, and also without GRH for
>> SIDR requests if we rely on layer 3 for the interface resolution. I'm
>> not against adding a LLADDR to the protocol somehow, but I don't think
>> we should abandon all these use cases and the interoperability with
>> existing software.
> 
> Well, there is a middle ground. Lets say we get the LLADDR in the GMP
> somehow, then we get 100% correct operation when it is present.
> 
> For degraded operation we have the (device,port,pkey) and possibly
> (device,port,pkey,gid) if there was a GRH. We also have the IP address
> hack.
> 
> So, I'd say, search in this sequence:
>  - If the LLADDR is present, just find the right netdev
>  - Otherwise search for (device,port,pkey) / (device,port,pkey,gid)
>    If there is only one match then that is unambiguously the correct
>    device to use.
>  - Repeat the above search, but add the IP address. This is the hack
>    we perform for compatibility.
> 
> There is no reason to hackily look at the GMP path parameters if we are
> relying on #3 above as the hack to save us in the legacy ambiguous case.
> 
> .. and to answer your question in the other email, I think we should
> keep the hack clearly distinct from the proper operation (in fact,
> perhaps it should be user configurable). So #3 should be a distinct
> step taken when all else fails, not integrated into earlier steps.
> 
> So, this series as it stands just needs to do #2/#3 above and you guys
> can figure out the LLADDR business later.

Okay. I can add a first search without the IP address.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index c5f5f89e274a..46f99ec4080a 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -2983,6 +2983,13 @@  static void cm_format_sidr_req_event(struct cm_work *work,
 	param->pkey = __be16_to_cpu(sidr_req_msg->pkey);
 	param->listen_id = listen_id;
 	param->service_id = sidr_req_msg->service_id;
+	if (work->mad_recv_wc->wc->wc_flags & IB_WC_GRH) {
+		param->grh = 1;
+		memcpy(&param->dgid, &work->mad_recv_wc->recv_buf.grh->dgid,
+		       sizeof(param->dgid));
+	} else {
+		param->grh = 0;
+	}
 	param->port = work->port->port_num;
 	work->cm_event.private_data = &sidr_req_msg->private_data;
 }
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index 1b567bbc3ad4..3a5d70d79790 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -224,6 +224,8 @@  struct ib_cm_apr_event_param {
 struct ib_cm_sidr_req_event_param {
 	struct ib_cm_id		*listen_id;
 	__be64			service_id;
+	union ib_gid		dgid;
+	u8			grh:1;
 	u8			port;
 	u16			pkey;
 };