Message ID | 20210115120558.29313-1-minwoo.im.dev@gmail.com |
---|---|
Headers | show |
Series | hw/block/nvme: support multi-path for ctrl/ns | expand |
On Jan 15 21:05, Minwoo Im wrote: > Hello, > > This series added support for multi-path I/O with multi-controllers and > namespace sharing. By supporting these features, we can test Linux > kernel mpath(multi-path) code with this NVMe device. > > Patches from the first to third added multi-controller support in a NVM > subsystem by adding a mpath.ctrl parameter to nvme device. The rest of > the patches added namespace sharing support in a NVM subsystem with two > or more controllers by adding mpath.ns parameter to nvme-ns device. > > Multi-path enabled in kernel with this series for two controllers with a > namespace: > > root@vm:~/work# nvme list -v > NVM Express Subsystems > > Subsystem Subsystem-NQN Controllers > ---------------- ------------------------------------------------------------------------------------------------ ---------------- > nvme-subsys0 nqn.2019-08.org.qemu:serial nvme0, nvme1 > > NVM Express Controllers > > Device SN MN FR TxPort Address Subsystem Namespaces > -------- -------------------- ---------------------------------------- -------- ------ -------------- ------------ ---------------- > nvme0 serial QEMU NVMe Ctrl 1.0 pcie 0000:01:00.0 nvme-subsys0 nvme0n1 > nvme1 serial QEMU NVMe Ctrl 1.0 pcie 0000:02:00.0 nvme-subsys0 nvme0n1 > > NVM Express Namespaces > > Device NSID Usage Format Controllers > ------------ -------- -------------------------- ---------------- ---------------- > nvme0n1 1 268.44 MB / 268.44 MB 512 B + 0 B nvme0, nvme1 > > The reason why I put 'RFC' tag to this series is mostly about the last > patch "hw/block/nvme: add namespace sharing param for mpath". It seems > like QEMU block backing device does not support to be shared among two > or more -device(s). It means that we just can't give same drive= > property to multiple nvme-ns devices. This patch has just let -device > maps to -drive one-to-one(1:1), but if namespae sharing is detected and > setup by the host kernel, then a single block device will be selected > for the NVM subsystem. I'm not sure this is a good start for this > feature, so I put the RFC tag here. > > Please kindly review! > Hi Minwoo, First - super awesome that we get this discussion going. I've been hacking around this a couple of times, but I've never been happy with the approach. As you already mentioned, the problem I see with this approach is that we have separate namespaces attached to separate controllers. So we are faking it to the max and if I/O starts going through the other controller we end up on a namespace that is unrelated (different data). Havoc ensues. My approach looks a lot like yours, but I hacked around this by adding extra 'ctrl-0', 'ctrl-1', ..., link-parameters to the namespace device, replacing the bus. This works well because the namespace then just registers with multiple controllers. But adding more parameters like that just isnt nice, so I've been trying to figure out how to allow a parameter to be specified multiple times, so we could just do more 'ctrl'-parameters. Alas, since I started thinking about namespace sharing I have been regretting that I didn't reverse the bus-mechanic for namespace attachment. It would align better with the theory of operation if it was the controllers that attached to namespaces, and not the other way around. So what I would actually really prefer, is that we had multiple 'ns' link parameters on the controller device. -device nvme-ns,id=a,nsid=1,... -device nvme-ns,id=b,nsid=2,... -device nvme-ns,id=c,nsid=3,... -device nvme,cntlid=0,serial=foo,ns=a,ns=b -device nvme,cntlid=1,serial=foo,ns=a,ns=c But I havn't been able to figure out how to kick QOM into doing this. And I'm definitely not sure this is the way to go. Should we instead introduce a 'nvme-subsys' device and walk a bus? I'd really appreciate some input on how we should model this if anyone has any thoughts. And I think we should consider stuff like detached namespaces as well. Support for Namespace Management. The whole shabang.
On Fri, Jan 15, 2021 at 02:57:45PM +0100, Klaus Jensen wrote: > > As you already mentioned, the problem I see with this approach is that > we have separate namespaces attached to separate controllers. So we are > faking it to the max and if I/O starts going through the other > controller we end up on a namespace that is unrelated (different data). > Havoc ensues. > > My approach looks a lot like yours, but I hacked around this by adding > extra 'ctrl-0', 'ctrl-1', ..., link-parameters to the namespace device, > replacing the bus. This works well because the namespace then just > registers with multiple controllers. But adding more parameters like > that just isnt nice, so I've been trying to figure out how to allow a > parameter to be specified multiple times, so we could just do more > 'ctrl'-parameters. > > Alas, since I started thinking about namespace sharing I have been > regretting that I didn't reverse the bus-mechanic for namespace > attachment. It would align better with the theory of operation if it was > the controllers that attached to namespaces, and not the other way > around. So what I would actually really prefer, is that we had multiple > 'ns' link parameters on the controller device. Would this work better if we introduce a new device in the nvme hierarchy: the nvme-subsystem? You could attach multi-path namespaces and controllers to that, and namespaces you don't want shared can attach directly to controllers like they do today. You could also auto-assign cntlid, and you wouldn't need to duplicate serial numbers in your parameters.
On Jan 15 09:35, Keith Busch wrote: > On Fri, Jan 15, 2021 at 02:57:45PM +0100, Klaus Jensen wrote: > > > > As you already mentioned, the problem I see with this approach is that > > we have separate namespaces attached to separate controllers. So we are > > faking it to the max and if I/O starts going through the other > > controller we end up on a namespace that is unrelated (different data). > > Havoc ensues. > > > > My approach looks a lot like yours, but I hacked around this by adding > > extra 'ctrl-0', 'ctrl-1', ..., link-parameters to the namespace device, > > replacing the bus. This works well because the namespace then just > > registers with multiple controllers. But adding more parameters like > > that just isnt nice, so I've been trying to figure out how to allow a > > parameter to be specified multiple times, so we could just do more > > 'ctrl'-parameters. > > > > Alas, since I started thinking about namespace sharing I have been > > regretting that I didn't reverse the bus-mechanic for namespace > > attachment. It would align better with the theory of operation if it was > > the controllers that attached to namespaces, and not the other way > > around. So what I would actually really prefer, is that we had multiple > > 'ns' link parameters on the controller device. > > Would this work better if we introduce a new device in the nvme hierarchy: > the nvme-subsystem? You could attach multi-path namespaces and > controllers to that, and namespaces you don't want shared can attach > directly to controllers like they do today. You could also auto-assign > cntlid, and you wouldn't need to duplicate serial numbers in your > parameters. I kinda POC'ed that, but I think I tried to make it work with a bus and walking it and all kinds of fancy stuff. I think it can just be a 'link' parameter, so something like: -device nvme-subsys,id=subsys0 -device nvme,id=nvme0,subsys=subsys0 -device nvme,id=nvme1,subsys=subsys0 -device nvme-ns,id=shared-ns1,nsid=1,subsys=subsys0 -device nvme-ns,id=private-ns2,nsid=2,bus=nvme0 When a controller "registers" with the subsystem it attaches to all namespaces known, and when a namespace attaches, it attaches to all controllers known. We can even add a 'detached' bool parameter to the namespace and keep controllers from attaching, but allowing for later attachment. Cool! Question: NSIDs must be the same on each controller for shared namespaces, but can private namespaces "share" nsid across controllers in the subsystem? I don't think the spec is clear on that point.
On Fri, Jan 15, 2021 at 06:47:20PM +0100, Klaus Jensen wrote: > Cool! I thought so too :) > Question: NSIDs must be the same on each controller for shared > namespaces, but can private namespaces "share" nsid across controllers > in the subsystem? I don't think the spec is clear on that point. The namespace NSID has to be unique within the entire subsystem, whether they're shared or private.
On 21-01-15 18:47:20, Klaus Jensen wrote: > On Jan 15 09:35, Keith Busch wrote: > > On Fri, Jan 15, 2021 at 02:57:45PM +0100, Klaus Jensen wrote: > > > > > > As you already mentioned, the problem I see with this approach is that > > > we have separate namespaces attached to separate controllers. So we are > > > faking it to the max and if I/O starts going through the other > > > controller we end up on a namespace that is unrelated (different data). > > > Havoc ensues. > > > > > > My approach looks a lot like yours, but I hacked around this by adding > > > extra 'ctrl-0', 'ctrl-1', ..., link-parameters to the namespace device, > > > replacing the bus. This works well because the namespace then just > > > registers with multiple controllers. But adding more parameters like > > > that just isnt nice, so I've been trying to figure out how to allow a > > > parameter to be specified multiple times, so we could just do more > > > 'ctrl'-parameters. > > > > > > Alas, since I started thinking about namespace sharing I have been > > > regretting that I didn't reverse the bus-mechanic for namespace > > > attachment. It would align better with the theory of operation if it was > > > the controllers that attached to namespaces, and not the other way > > > around. So what I would actually really prefer, is that we had multiple > > > 'ns' link parameters on the controller device. > > > > Would this work better if we introduce a new device in the nvme hierarchy: > > the nvme-subsystem? You could attach multi-path namespaces and > > controllers to that, and namespaces you don't want shared can attach > > directly to controllers like they do today. You could also auto-assign > > cntlid, and you wouldn't need to duplicate serial numbers in your > > parameters. > > I kinda POC'ed that, but I think I tried to make it work with a bus and > walking it and all kinds of fancy stuff. > > I think it can just be a 'link' parameter, so something like: > > -device nvme-subsys,id=subsys0 Do we have any plan for default subsys hierarchy? Or is it going to be a mandatory root node of nvme controllers and namespaces? > -device nvme,id=nvme0,subsys=subsys0 > -device nvme,id=nvme1,subsys=subsys0 > -device nvme-ns,id=shared-ns1,nsid=1,subsys=subsys0 In this case, what is the default set-up for shared-ns1? Is this namespace going to be ready right after the two nvme controllers being realized? If so, do we iterate all the namespace devices in the NVM subsystem and attach them to this controller in the initial time? If so, I agree with this approach. > -device nvme-ns,id=private-ns2,nsid=2,bus=nvme0 This must be the case what Keith mentioned of directly attaching to a controller. It looks nice. But, one concerning point here is that, in !shared namespace, if we don't specify 'subsys' property here to attach it to directly to a controller, it means it implicitly will belong to the subsys0 where the nvme0 belongs to. It means that user should give nsid different than 1 which is already shared. So, how do we make subsys property as a mandatory for namespace device and provide optional choice for bus. If bus is given to a controller, then it can mean a private namespace, otherwise it can be shared among controllers in a subsystem.