Message ID | 20161014073821.qvkpjtwraqnwzzs3@linux-x5ow.site (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Fri, Oct 14, 2016 at 09:38:21AM +0200, Johannes Thumshirn wrote: > On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote: > > Hm, still behaves for me like I reported for v2: > > http://marc.info/?l=linux-scsi&m=147637177902937&w=2 > > Hi Steffen, > > Can you please try the following on top of 2/16? > > diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c > index 4149dac..baebaab 100644 > --- a/drivers/scsi/scsi_transport_fc.c > +++ b/drivers/scsi/scsi_transport_fc.c > @@ -3786,6 +3786,12 @@ enum fc_dispatch_result { > int cmdlen = sizeof(uint32_t); /* start with length of msgcode */ > int ret; > > + /* check if we really have all the request data needed */ > + if (job->request_len < cmdlen) { > + ret = -ENOMSG; > + goto fail_host_msg; > + } > + > /* Validate the host command */ > switch (bsg_request->msgcode) { > case FC_BSG_HST_ADD_RPORT: > @@ -3831,12 +3837,6 @@ enum fc_dispatch_result { > goto fail_host_msg; > } > > - /* check if we really have all the request data needed */ > - if (job->request_len < cmdlen) { > - ret = -ENOMSG; > - goto fail_host_msg; > - } > - > ret = i->f->bsg_request(job); > if (!ret) > return FC_DISPATCH_UNLOCKED; > @@ -3887,6 +3887,12 @@ enum fc_dispatch_result { > int cmdlen = sizeof(uint32_t); /* start with length of msgcode */ > int ret; > > + /* check if we really have all the request data needed */ > + if (job->request_len < cmdlen) { > + ret = -ENOMSG; > + goto fail_rport_msg; > + } > + > /* Validate the rport command */ > switch (bsg_request->msgcode) { > case FC_BSG_RPT_ELS: > > > > The rational behind this is, in fc_req_to_bsgjob() we're assigning > job->request as req->cmd and job->request_len = req->cmd_len. But without > checkinf job->request_len we don't know whether we're save to touch > job->request (a.k.a. bsg_request). Hi Steffen, Did you have any chance testing this? I hacked fcping to work with non-FCoE and rports as well and tested with FCoE and lpfc. No problems seen from my side. I've also pused the series (With this change folded in) to my git tree at [1] if this helps you in any way. [1] https://git.kernel.org/cgit/linux/kernel/git/jth/linux.git/log/?h=scsi-bsg-rewrite-v4 Thanks a lot, Johannes
On Fri, Oct 14, 2016 at 09:38:21AM +0200, Johannes Thumshirn wrote: > On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote: > > Hm, still behaves for me like I reported for v2: > > http://marc.info/?l=linux-scsi&m=147637177902937&w=2 > > Hi Steffen, > > Can you please try the following on top of 2/16? How does this fit into the patch we're replying to?
On Thu, Nov 03, 2016 at 08:17:51AM -0700, Christoph Hellwig wrote: > On Fri, Oct 14, 2016 at 09:38:21AM +0200, Johannes Thumshirn wrote: > > On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote: > > > Hm, still behaves for me like I reported for v2: > > > http://marc.info/?l=linux-scsi&m=147637177902937&w=2 > > > > Hi Steffen, > > > > Can you please try the following on top of 2/16? > > How does this fit into the patch we're replying to? Sorry but I don't quite get you. If you look at a mixed source assembly listing for fc_host_dispatch() [1] we load the the offset of 24 from %r12 into %r1 and then the beginning of %r1 into %r2 and crash. So if we now check if job->request_len is smaller than uint32_t we know we can't have a messagecode in the request and bail out early, instead of accessing a possible NULL pointer. At least that's my analysis, feel free to correct me if I'm wrong. [1] Listing: static int fc_bsg_host_dispatch(struct Scsi_Host *shost, struct bsg_job *job) { struct fc_internal *i = to_fc_internal(shost->transportt); struct fc_bsg_request *bsg_request = job->request; 5d54: e3 10 c0 18 00 04 lg %r1,24(%r12) struct fc_bsg_reply *bsg_reply = job->reply; 5d5a: e3 b0 c0 20 00 04 lg %r11,32(%r12) int cmdlen = sizeof(uint32_t); /* start with length of msgcode */ int ret; /* check if we really have all the request data needed */ if (job->request_len < cmdlen) { 5d60: a7 c4 00 48 jle 5df0 <fc_bsg_dispatch+0x1f0> ret = -ENOMSG; goto fail_host_msg; } /* Validate the host command */ switch (bsg_request->msgcode) { 5d64: 58 20 10 00 l %r2,0(%r1) 5d68: c2 2f 80 00 00 03 clfi %r2,2147483651 5d6e: a7 84 00 1a je 5da2 <fc_bsg_dispatch+0x1a2> 5d72: a7 24 00 0a jh 5d86 <fc_bsg_dispatch+0x186> 5d76: c0 19 80 00 00 01 iilf %r1,2147483649 5d7c: ec 21 00 2b a0 77 clrj %r2,%r1,10,5dd2 <fc_bsg_dispatch+0x1d2> 5d82: a7 f4 00 3b j 5df8 <fc_bsg_dispatch+0x1f8> 5d86: c0 59 80 00 00 04 iilf %r5,2147483652 5d8c: ec 25 00 0b 80 76 crj %r2,%r5,8,5da2 <fc_bsg_dispatch+0x1a2> 5d92: c0 59 80 00 00 ff iilf %r5,2147483903 5d98: ec 25 00 13 80 76 crj %r2,%r5,8,5dbe <fc_bsg_dispatch+0x1be> 5d9e: a7 f4 00 2d j 5df8 <fc_bsg_dispatch+0x1f8> Thanks, Johannes
On Tue, Oct 25, 2016 at 09:43:14AM +0200, Johannes Thumshirn wrote: > On Fri, Oct 14, 2016 at 09:38:21AM +0200, Johannes Thumshirn wrote: > > On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote: > > > Hm, still behaves for me like I reported for v2: > > > http://marc.info/?l=linux-scsi&m=147637177902937&w=2 [...] > > > > The rational behind this is, in fc_req_to_bsgjob() we're assigning > > job->request as req->cmd and job->request_len = req->cmd_len. But without > > checkinf job->request_len we don't know whether we're save to touch > > job->request (a.k.a. bsg_request). > > Hi Steffen, > Did you have any chance testing this? I hacked fcping to work with non-FCoE > and rports as well and tested with FCoE and lpfc. No problems seen from my > side. I've also pused the series (With this change folded in) to my git > tree at [1] if this helps you in any way. > > [1] https://git.kernel.org/cgit/linux/kernel/git/jth/linux.git/log/?h=scsi-bsg-rewrite-v4 > So I finally have a test system up and running. I have good and bad news. The good news is, I can't get the system crashing with my patches, the bad news is I can't get zfcp_ping and zfcp_show to output something but HBA_STATUS_ERROR with my patches and without. And btw, I renamed the branch to fc-bsg-rewrite-v4 in case you want to clone from it (it has patch 2/16 changed to the v3 submission). Can you please have a look with your setup? Thanks, Johannes
Hi Johannes, On 11/15/2016 12:56 PM, Johannes Thumshirn wrote: > On Tue, Oct 25, 2016 at 09:43:14AM +0200, Johannes Thumshirn wrote: >> On Fri, Oct 14, 2016 at 09:38:21AM +0200, Johannes Thumshirn wrote: >>> On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote: >>>> Hm, still behaves for me like I reported for v2: >>>> http://marc.info/?l=linux-scsi&m=147637177902937&w=2 > > [...] > >>> >>> The rational behind this is, in fc_req_to_bsgjob() we're assigning >>> job->request as req->cmd and job->request_len = req->cmd_len. But without >>> checkinf job->request_len we don't know whether we're save to touch >>> job->request (a.k.a. bsg_request). >> >> Hi Steffen, >> Did you have any chance testing this? I hacked fcping to work with non-FCoE >> and rports as well and tested with FCoE and lpfc. No problems seen from my >> side. I've also pused the series (With this change folded in) to my git >> tree at [1] if this helps you in any way. >> >> [1] https://git.kernel.org/cgit/linux/kernel/git/jth/linux.git/log/?h=scsi-bsg-rewrite-v4 >> > > So I finally have a test system up and running. I have good and bad news. The > good news is, I can't get the system crashing with my patches, the bad news is > I can't get zfcp_ping and zfcp_show to output something but HBA_STATUS_ERROR > with my patches and without. Assuming you run the latest package version on s390x: Do steps 2 and 3 of the procedure in http://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.lhdd/lhdd_t_fcp_api_runappl.html help? The only other thing I can think of from the top of my head is that BSG ioctls are sensitive regarding ABI and I once had the kernel ioctl return EINVAL due to unmatching kernel-headers and libzfcphbaapi maps this EINVAL to HBA_STATUS_ERROR because there is no more specifically suitable HBA constant [old SUSE bugs 834498 and 834500]. > And btw, I renamed the branch to fc-bsg-rewrite-v4 in case you want to clone > from it (it has patch 2/16 changed to the v3 submission). > > Can you please have a look with your setup? I'm going to re-test hopefully within the next few days.
On Tue, Nov 15, 2016 at 03:31:27PM +0100, Steffen Maier wrote: > Hi Johannes, > > On 11/15/2016 12:56 PM, Johannes Thumshirn wrote: > > On Tue, Oct 25, 2016 at 09:43:14AM +0200, Johannes Thumshirn wrote: > > > On Fri, Oct 14, 2016 at 09:38:21AM +0200, Johannes Thumshirn wrote: > > > > On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote: > > > > > Hm, still behaves for me like I reported for v2: > > > > > http://marc.info/?l=linux-scsi&m=147637177902937&w=2 > > > > [...] > > > > > > > > > > The rational behind this is, in fc_req_to_bsgjob() we're assigning > > > > job->request as req->cmd and job->request_len = req->cmd_len. But without > > > > checkinf job->request_len we don't know whether we're save to touch > > > > job->request (a.k.a. bsg_request). > > > > > > Hi Steffen, > > > Did you have any chance testing this? I hacked fcping to work with non-FCoE > > > and rports as well and tested with FCoE and lpfc. No problems seen from my > > > side. I've also pused the series (With this change folded in) to my git > > > tree at [1] if this helps you in any way. > > > > > > [1] https://git.kernel.org/cgit/linux/kernel/git/jth/linux.git/log/?h=scsi-bsg-rewrite-v4 > > > > > > > So I finally have a test system up and running. I have good and bad news. The > > good news is, I can't get the system crashing with my patches, the bad news is > > I can't get zfcp_ping and zfcp_show to output something but HBA_STATUS_ERROR > > with my patches and without. > > Assuming you run the latest package version on s390x: > > Do steps 2 and 3 of the procedure in > http://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.lhdd/lhdd_t_fcp_api_runappl.html > help? No, I do have a correct /etc/hba.conf: $ grep libzfcphbaapi /etc/hba.conf com.ibm.libzfcphbaapi /usr/lib64/libzfcphbaapi.so.0 and bumping the log level via export LIB_ZFCP_HBAAPI_LOG_LEVEL=1 just adds '(vlib.c:105)_initvlib: libzfcphbaapi.so loaded at Nov 15 15:43:23' to the 1st line of zfcp_ping output. Here's the full output: $ zfcp_ping -v -a 0.0.fc00 -d 0x50050763051b473a (vlib.c:105)_initvlib: libzfcphbaapi.so loaded at Nov 15 15:45:33 Sending PNG from BUS_ID=0.0.fc00 WWPN=0xc05076e0f3002344 ID=0x760c1c dev=/dev/bsg/fc_host1 speed=8 GBit/s --- REQUEST cmd = 0x0401 --- 03 00 00 00 fa 01 00 00 04 01 00 01 00 00 00 00 00 00 00 01 00 02 00 08 50 05 07 63 05 1b 47 3a 00 00 00 00 --- RESPONSE rc = 0x1 --- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --- REQUEST cmd = 0x0401 --- 03 00 00 00 fa 01 00 00 04 01 00 01 00 00 00 00 00 00 00 01 00 02 00 08 50 05 07 63 05 1b 47 3a 00 00 00 00 --- RESPONSE rc = 0x1 --- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --- REQUEST cmd = 0x0401 --- 03 00 00 00 fa 01 00 00 04 01 00 01 00 00 00 00 00 00 00 01 00 02 00 08 50 05 07 63 05 1b 47 3a 00 00 00 00 --- RESPONSE rc = 0x1 --- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Error received for FPNG request, aborting. ---------- ping statistics ----------- min/avg/max = 0.000/0.000/0.000 ms -------------------------------------- I agree this looks a bit suspicious and I'll add some debugs in the kernel to see what's going on. > > The only other thing I can think of from the top of my head is that BSG > ioctls are sensitive regarding ABI and I once had the kernel ioctl return > EINVAL due to unmatching kernel-headers and libzfcphbaapi maps this EINVAL > to HBA_STATUS_ERROR because there is no more specifically suitable HBA > constant [old SUSE bugs 834498 and 834500]. > > > And btw, I renamed the branch to fc-bsg-rewrite-v4 in case you want to clone > > from it (it has patch 2/16 changed to the v3 submission). > > > > Can you please have a look with your setup? > > I'm going to re-test hopefully within the next few days. That'll be great, thanks. Byte, Johannes
On Tue, Nov 15, 2016 at 03:31:27PM +0100, Steffen Maier wrote: > Hi Johannes, > > On 11/15/2016 12:56 PM, Johannes Thumshirn wrote: > > On Tue, Oct 25, 2016 at 09:43:14AM +0200, Johannes Thumshirn wrote: > > > On Fri, Oct 14, 2016 at 09:38:21AM +0200, Johannes Thumshirn wrote: > > > > On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote: > > > > > Hm, still behaves for me like I reported for v2: > > > > > http://marc.info/?l=linux-scsi&m=147637177902937&w=2 > > > > [...] > > > > > > > > > > The rational behind this is, in fc_req_to_bsgjob() we're assigning > > > > job->request as req->cmd and job->request_len = req->cmd_len. But without > > > > checkinf job->request_len we don't know whether we're save to touch > > > > job->request (a.k.a. bsg_request). > > > > > > Hi Steffen, > > > Did you have any chance testing this? I hacked fcping to work with non-FCoE > > > and rports as well and tested with FCoE and lpfc. No problems seen from my > > > side. I've also pused the series (With this change folded in) to my git > > > tree at [1] if this helps you in any way. > > > > > > [1] https://git.kernel.org/cgit/linux/kernel/git/jth/linux.git/log/?h=scsi-bsg-rewrite-v4 > > > > > > > So I finally have a test system up and running. I have good and bad news. The > > good news is, I can't get the system crashing with my patches, the bad news is > > I can't get zfcp_ping and zfcp_show to output something but HBA_STATUS_ERROR > > with my patches and without. Please ignore my last mails, apparently it's a wise idea to check which user id one has before running zfcp_ping... The good news for this is, I can now recreate the crashes you have and thus have a chance to fix them :-) Byte, Johannes
On Tue, Nov 15, 2016 at 04:39:33PM +0100, Johannes Thumshirn wrote: > On Tue, Nov 15, 2016 at 03:31:27PM +0100, Steffen Maier wrote: > > Hi Johannes, > > > > On 11/15/2016 12:56 PM, Johannes Thumshirn wrote: > > > On Tue, Oct 25, 2016 at 09:43:14AM +0200, Johannes Thumshirn wrote: > > > > On Fri, Oct 14, 2016 at 09:38:21AM +0200, Johannes Thumshirn wrote: > > > > > On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote: > > > > > > Hm, still behaves for me like I reported for v2: > > > > > > http://marc.info/?l=linux-scsi&m=147637177902937&w=2 > > > > > > [...] > > > > > > > > > > > > > The rational behind this is, in fc_req_to_bsgjob() we're assigning > > > > > job->request as req->cmd and job->request_len = req->cmd_len. But without > > > > > checkinf job->request_len we don't know whether we're save to touch > > > > > job->request (a.k.a. bsg_request). > > > > > > > > Hi Steffen, > > > > Did you have any chance testing this? I hacked fcping to work with non-FCoE > > > > and rports as well and tested with FCoE and lpfc. No problems seen from my > > > > side. I've also pused the series (With this change folded in) to my git > > > > tree at [1] if this helps you in any way. > > > > > > > > [1] https://git.kernel.org/cgit/linux/kernel/git/jth/linux.git/log/?h=scsi-bsg-rewrite-v4 > > > > > > > > > > So I finally have a test system up and running. I have good and bad news. The > > > good news is, I can't get the system crashing with my patches, the bad news is > > > I can't get zfcp_ping and zfcp_show to output something but HBA_STATUS_ERROR > > > with my patches and without. > > Please ignore my last mails, apparently it's a wise idea to check which user > id one has before running zfcp_ping... > > The good news for this is, I can now recreate the crashes you have and thus > have a chance to fix them :-) So JFTR, I was able to fix the 1 problem introduced by this patch, it's the follwoing hunk: @@ -3726,9 +3729,9 @@ fc_req_to_bsgjob(struct Scsi_Host *shost, struct fc_rport *rport, if (i->f->dd_bsg_size) job->dd_data = (void *)&job[1]; spin_lock_init(&job->job_lock); - job->request = (struct fc_bsg_request *)req->cmd; + bsg_request = (struct fc_bsg_request *)req->cmd; job->request_len = req->cmd_len; - job->reply = req->sense; + bsg_reply = req->sense; job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer * allocated */ if (req->bio) { But as fc_req_to_bsgjob() get's deleted in Patch 15/16 the problem is re-introduced. Unfortunately the fix isn't as trivial as for 2/16 so I'm still trying to nail it down. Thanks, Johannes
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c index 4149dac..baebaab 100644 --- a/drivers/scsi/scsi_transport_fc.c +++ b/drivers/scsi/scsi_transport_fc.c @@ -3786,6 +3786,12 @@ enum fc_dispatch_result { int cmdlen = sizeof(uint32_t); /* start with length of msgcode */ int ret; + /* check if we really have all the request data needed */ + if (job->request_len < cmdlen) { + ret = -ENOMSG; + goto fail_host_msg; + } + /* Validate the host command */ switch (bsg_request->msgcode) { case FC_BSG_HST_ADD_RPORT: @@ -3831,12 +3837,6 @@ enum fc_dispatch_result { goto fail_host_msg; } - /* check if we really have all the request data needed */ - if (job->request_len < cmdlen) { - ret = -ENOMSG; - goto fail_host_msg; - } - ret = i->f->bsg_request(job); if (!ret) return FC_DISPATCH_UNLOCKED; @@ -3887,6 +3887,12 @@ enum fc_dispatch_result { int cmdlen = sizeof(uint32_t); /* start with length of msgcode */ int ret; + /* check if we really have all the request data needed */ + if (job->request_len < cmdlen) { + ret = -ENOMSG; + goto fail_rport_msg; + } + /* Validate the rport command */ switch (bsg_request->msgcode) { case FC_BSG_RPT_ELS: