Message ID | 1420926734-16417-2-git-send-email-richard@nod.at |
---|---|
State | Accepted |
Headers | show |
> + struct ubi_sgl usgl; Btw, what's in struct ubi_sgl? Can't find that in my tree. > +static void ubiblock_do_work(struct work_struct *work) > +{ > + int ret; > + struct ubiblock_pdu *pdu = container_of(work, struct ubiblock_pdu, work); > + struct request *req = blk_mq_rq_from_pdu(pdu); > + > + blk_mq_start_request(req); > + blk_rq_map_sg(req->q, req, pdu->usgl.sg); blk_rq_map_sg returns the number of entries actually mapped, which might be smaller than the number passed in due to merging.
Am 13.01.2015 um 17:25 schrieb Christoph Hellwig: >> + struct ubi_sgl usgl; > > Btw, what's in struct ubi_sgl? Can't find that in my tree. "[PATCH 1/2] UBI: Add initial support for scatter gather" adds it. /** * struct ubi_sgl - UBI scatter gather list data structure. * @list_pos: current position in @sg[] * @page_pos: current position in @sg[@list_pos] * @sg: the scatter gather list itself * * ubi_sgl is a wrapper around a scatter list which keeps track of the * current position in the list and the current list item such that * it can be used across multiple ubi_leb_read_sg() calls. */ struct ubi_sgl { int list_pos; int page_pos; struct scatterlist sg[UBI_MAX_SG_COUNT]; }; >> +static void ubiblock_do_work(struct work_struct *work) >> +{ >> + int ret; >> + struct ubiblock_pdu *pdu = container_of(work, struct ubiblock_pdu, work); >> + struct request *req = blk_mq_rq_from_pdu(pdu); >> + >> + blk_mq_start_request(req); >> + blk_rq_map_sg(req->q, req, pdu->usgl.sg); > > blk_rq_map_sg returns the number of entries actually mapped, which > might be smaller than the number passed in due to merging. Yep, but the ubi_sql has a fixed number of scatterlist entries, UBI_MAX_SG_COUNT. And I limit it also to that using: blk_queue_max_segments(dev->rq, UBI_MAX_SG_COUNT); Is there another reason why I should use the return value of blk_rq_map_sg()? Please also note that the UBI block driver is read-only. Thanks, //richard
On 01/13/2015 03:51 PM, Richard Weinberger wrote: > Am 13.01.2015 um 17:25 schrieb Christoph Hellwig: >>> + struct ubi_sgl usgl; >> >> Btw, what's in struct ubi_sgl? Can't find that in my tree. > > "[PATCH 1/2] UBI: Add initial support for scatter gather" adds it. > > /** > * struct ubi_sgl - UBI scatter gather list data structure. > * @list_pos: current position in @sg[] > * @page_pos: current position in @sg[@list_pos] > * @sg: the scatter gather list itself > * > * ubi_sgl is a wrapper around a scatter list which keeps track of the > * current position in the list and the current list item such that > * it can be used across multiple ubi_leb_read_sg() calls. > */ > struct ubi_sgl { > int list_pos; > int page_pos; > struct scatterlist sg[UBI_MAX_SG_COUNT]; > }; > >>> +static void ubiblock_do_work(struct work_struct *work) >>> +{ >>> + int ret; >>> + struct ubiblock_pdu *pdu = container_of(work, struct ubiblock_pdu, work); >>> + struct request *req = blk_mq_rq_from_pdu(pdu); >>> + >>> + blk_mq_start_request(req); >>> + blk_rq_map_sg(req->q, req, pdu->usgl.sg); >> >> blk_rq_map_sg returns the number of entries actually mapped, which >> might be smaller than the number passed in due to merging. > > Yep, but the ubi_sql has a fixed number of scatterlist entries, UBI_MAX_SG_COUNT. > And I limit it also to that using: blk_queue_max_segments(dev->rq, UBI_MAX_SG_COUNT); > > Is there another reason why I should use the return value of blk_rq_map_sg()? > Please also note that the UBI block driver is read-only. It can return less than what you asked for, if segments are coalesced. Read/write, doesn't matter. You should always use the returned value as the indication of how many segments to access in pdu->usgl.sg for data transfer.
Am 13.01.2015 um 23:54 schrieb Jens Axboe: >>> blk_rq_map_sg returns the number of entries actually mapped, which >>> might be smaller than the number passed in due to merging. >> >> Yep, but the ubi_sql has a fixed number of scatterlist entries, UBI_MAX_SG_COUNT. >> And I limit it also to that using: blk_queue_max_segments(dev->rq, UBI_MAX_SG_COUNT); >> >> Is there another reason why I should use the return value of blk_rq_map_sg()? >> Please also note that the UBI block driver is read-only. > > It can return less than what you asked for, if segments are coalesced. > Read/write, doesn't matter. You should always use the returned value as > the indication of how many segments to access in pdu->usgl.sg for data > transfer. Sorry, I don't fully understand. Currently the driver does: to_read = blk_rq_bytes(req); Then it fills pdu->usgl.sg up to to_read bytes and calls blk_mq_end_request(). If I understand you correctly it can happen that blk_rq_bytes() returns more bytes than blk_rq_map_sg() allocated, right? Thanks, //richard
On 01/13/2015 04:17 PM, Richard Weinberger wrote: > Am 13.01.2015 um 23:54 schrieb Jens Axboe: >>>> blk_rq_map_sg returns the number of entries actually mapped, which >>>> might be smaller than the number passed in due to merging. >>> >>> Yep, but the ubi_sql has a fixed number of scatterlist entries, UBI_MAX_SG_COUNT. >>> And I limit it also to that using: blk_queue_max_segments(dev->rq, UBI_MAX_SG_COUNT); >>> >>> Is there another reason why I should use the return value of blk_rq_map_sg()? >>> Please also note that the UBI block driver is read-only. >> >> It can return less than what you asked for, if segments are coalesced. >> Read/write, doesn't matter. You should always use the returned value as >> the indication of how many segments to access in pdu->usgl.sg for data >> transfer. > > Sorry, I don't fully understand. > > Currently the driver does: > to_read = blk_rq_bytes(req); > Then it fills pdu->usgl.sg up to to_read bytes > and calls blk_mq_end_request(). > > If I understand you correctly it can happen that blk_rq_bytes() returns > more bytes than blk_rq_map_sg() allocated, right? No, the number of bytes will be the same, no magic is involved :-) But lets say the initial request has 4 bios, with each 2 pages, for a total of 8 segments. Lets further assume that the pages in each bio are contiguous, so that blk_rq_map_sg() will map this to 4 sg elements, each 2xpages long. Now, this may already be handled just fine, and you don't need to update/store the actual sg count. I just looked at the source, and I'm assuming it'll do the right thing (ubi_read_sg() will bump the active sg element, when that size has been consumed), but I don't have ubi_read_sg() in my tree to verify.
Am 14.01.2015 um 00:30 schrieb Jens Axboe: >> If I understand you correctly it can happen that blk_rq_bytes() returns >> more bytes than blk_rq_map_sg() allocated, right? > > No, the number of bytes will be the same, no magic is involved :-) Good to know. :) > But lets say the initial request has 4 bios, with each 2 pages, for a > total of 8 segments. Lets further assume that the pages in each bio are > contiguous, so that blk_rq_map_sg() will map this to 4 sg elements, each > 2xpages long. > > Now, this may already be handled just fine, and you don't need to > update/store the actual sg count. I just looked at the source, and I'm > assuming it'll do the right thing (ubi_read_sg() will bump the active sg > element, when that size has been consumed), but I don't have > ubi_read_sg() in my tree to verify. Currently the sg count is hard coded to UBI_MAX_SG_COUNT. I'm sorry, I forgot to CC you and hch to this patch: https://lkml.org/lkml/2015/1/10/204 Thanks, //richard
On 01/13/2015 04:36 PM, Richard Weinberger wrote: > > > Am 14.01.2015 um 00:30 schrieb Jens Axboe: >>> If I understand you correctly it can happen that blk_rq_bytes() returns >>> more bytes than blk_rq_map_sg() allocated, right? >> >> No, the number of bytes will be the same, no magic is involved :-) > > Good to know. :) > >> But lets say the initial request has 4 bios, with each 2 pages, for a >> total of 8 segments. Lets further assume that the pages in each bio are >> contiguous, so that blk_rq_map_sg() will map this to 4 sg elements, each >> 2xpages long. >> >> Now, this may already be handled just fine, and you don't need to >> update/store the actual sg count. I just looked at the source, and I'm >> assuming it'll do the right thing (ubi_read_sg() will bump the active sg >> element, when that size has been consumed), but I don't have >> ubi_read_sg() in my tree to verify. > > Currently the sg count is hard coded to UBI_MAX_SG_COUNT. The max count doesn't matter, that just provides you a guarantee that you'll never receive a request that maps to more than that. The point I'm trying to make is that if you receive 8 segments and it maps to 4, then you better not look at segments 5..8 after it being mapped. Whatever the max is, doesn't matter in this conversation. > I'm sorry, I forgot to CC you and hch to this patch: Which is as I suspected, you'll do each segment to the length specified, hence you don't need to track the returned count from blk_rq_map_sg().
Am 14.01.2015 um 01:23 schrieb Jens Axboe: > On 01/13/2015 04:36 PM, Richard Weinberger wrote: >> >> >> Am 14.01.2015 um 00:30 schrieb Jens Axboe: >>>> If I understand you correctly it can happen that blk_rq_bytes() returns >>>> more bytes than blk_rq_map_sg() allocated, right? >>> >>> No, the number of bytes will be the same, no magic is involved :-) >> >> Good to know. :) >> >>> But lets say the initial request has 4 bios, with each 2 pages, for a >>> total of 8 segments. Lets further assume that the pages in each bio are >>> contiguous, so that blk_rq_map_sg() will map this to 4 sg elements, each >>> 2xpages long. >>> >>> Now, this may already be handled just fine, and you don't need to >>> update/store the actual sg count. I just looked at the source, and I'm >>> assuming it'll do the right thing (ubi_read_sg() will bump the active sg >>> element, when that size has been consumed), but I don't have >>> ubi_read_sg() in my tree to verify. >> >> Currently the sg count is hard coded to UBI_MAX_SG_COUNT. > > The max count doesn't matter, that just provides you a guarantee that > you'll never receive a request that maps to more than that. The point > I'm trying to make is that if you receive 8 segments and it maps to 4, > then you better not look at segments 5..8 after it being mapped. > Whatever the max is, doesn't matter in this conversation. > >> I'm sorry, I forgot to CC you and hch to this patch: > > Which is as I suspected, you'll do each segment to the length specified, > hence you don't need to track the returned count from blk_rq_map_sg(). Thanks a lot for the kind explanation, Jens! I'll add a comment the usage of blk_rq_map_sg() to avoid further confusion. Am I allowed to add your Reviewed-by too? Thanks, //richard
Am 14.01.2015 um 09:39 schrieb Richard Weinberger: > Am 14.01.2015 um 01:23 schrieb Jens Axboe: >> Which is as I suspected, you'll do each segment to the length specified, >> hence you don't need to track the returned count from blk_rq_map_sg(). > > Thanks a lot for the kind explanation, Jens! > I'll add a comment the usage of blk_rq_map_sg() to avoid further confusion. > Am I allowed to add your Reviewed-by too? Ping? I'd like to add a Reviewed-by to this patch. Thanks, //richard
On 01/26/2015 04:55 PM, Richard Weinberger wrote: > Am 14.01.2015 um 09:39 schrieb Richard Weinberger: >> Am 14.01.2015 um 01:23 schrieb Jens Axboe: >>> Which is as I suspected, you'll do each segment to the length specified, >>> hence you don't need to track the returned count from blk_rq_map_sg(). >> >> Thanks a lot for the kind explanation, Jens! >> I'll add a comment the usage of blk_rq_map_sg() to avoid further confusion. >> Am I allowed to add your Reviewed-by too? > > Ping? > I'd like to add a Reviewed-by to this patch. Yup sorry, looks OK to go in from my point of view. You can add my reviewed-by.
On 01/10/2015 06:52 PM, Richard Weinberger wrote: > Convert the driver to blk-mq. > Beside of moving to the modern block interface this change boosts > also the performance of the driver. > > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda > nand: Micron NAND 256MiB 3,3V 8-bit > nand: 256MiB, SLC, page size: 2048, OOB size: 64 > > root@debian-armhf:~# dd if=/dev/ubiblock0_0 of=/dev/zero bs=1M > 243+1 records in > 243+1 records out > 255080448 bytes (255 MB) copied, 4.39295 s, 58.1 MB/s > > vs. > > root@debian-armhf:~# dd if=/dev/ubiblock0_0 of=/dev/zero bs=1M > 243+1 records in > 243+1 records out > 255080448 bytes (255 MB) copied, 2.87676 s, 88.7 MB/s > > Cc: hch@infradead.org > Cc: axboe@fb.com > Cc: tom.leiming@gmail.com > Signed-off-by: Richard Weinberger <richard@nod.at> > Tested-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Acked-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> And thanks a lot for the good work!
diff --git a/drivers/mtd/ubi/block.c b/drivers/mtd/ubi/block.c index 6b6bce2..00caf46 100644 --- a/drivers/mtd/ubi/block.c +++ b/drivers/mtd/ubi/block.c @@ -42,11 +42,12 @@ #include <linux/list.h> #include <linux/mutex.h> #include <linux/slab.h> -#include <linux/vmalloc.h> #include <linux/mtd/ubi.h> #include <linux/workqueue.h> #include <linux/blkdev.h> +#include <linux/blk-mq.h> #include <linux/hdreg.h> +#include <linux/scatterlist.h> #include <asm/div64.h> #include "ubi-media.h" @@ -67,6 +68,11 @@ struct ubiblock_param { char name[UBIBLOCK_PARAM_LEN+1]; }; +struct ubiblock_pdu { + struct work_struct work; + struct ubi_sgl usgl; +}; + /* Numbers of elements set in the @ubiblock_param array */ static int ubiblock_devs __initdata; @@ -84,11 +90,10 @@ struct ubiblock { struct request_queue *rq; struct workqueue_struct *wq; - struct work_struct work; struct mutex dev_mutex; - spinlock_t queue_lock; struct list_head list; + struct blk_mq_tag_set tag_set; }; /* Linked list of all ubiblock instances */ @@ -181,31 +186,20 @@ static struct ubiblock *find_dev_nolock(int ubi_num, int vol_id) return NULL; } -static int ubiblock_read_to_buf(struct ubiblock *dev, char *buffer, - int leb, int offset, int len) +static int ubiblock_read(struct ubiblock_pdu *pdu) { - int ret; + int ret, leb, offset, bytes_left, to_read; + u64 pos; + struct request *req = blk_mq_rq_from_pdu(pdu); + struct ubiblock *dev = req->q->queuedata; - ret = ubi_read(dev->desc, leb, buffer, offset, len); - if (ret) { - dev_err(disk_to_dev(dev->gd), "%d while reading from LEB %d (offset %d, length %d)", - ret, leb, offset, len); - return ret; - } - return 0; -} - -static int ubiblock_read(struct ubiblock *dev, char *buffer, - sector_t sec, int len) -{ - int ret, leb, offset; - int bytes_left = len; - int to_read = len; - u64 pos = sec << 9; + to_read = blk_rq_bytes(req); + pos = blk_rq_pos(req) << 9; /* Get LEB:offset address to read from */ offset = do_div(pos, dev->leb_size); leb = pos; + bytes_left = to_read; while (bytes_left) { /* @@ -215,11 +209,10 @@ static int ubiblock_read(struct ubiblock *dev, char *buffer, if (offset + to_read > dev->leb_size) to_read = dev->leb_size - offset; - ret = ubiblock_read_to_buf(dev, buffer, leb, offset, to_read); - if (ret) + ret = ubi_read_sg(dev->desc, leb, &pdu->usgl, offset, to_read); + if (ret < 0) return ret; - buffer += to_read; bytes_left -= to_read; to_read = bytes_left; leb += 1; @@ -228,79 +221,6 @@ static int ubiblock_read(struct ubiblock *dev, char *buffer, return 0; } -static int do_ubiblock_request(struct ubiblock *dev, struct request *req) -{ - int len, ret; - sector_t sec; - - if (req->cmd_type != REQ_TYPE_FS) - return -EIO; - - if (blk_rq_pos(req) + blk_rq_cur_sectors(req) > - get_capacity(req->rq_disk)) - return -EIO; - - if (rq_data_dir(req) != READ) - return -ENOSYS; /* Write not implemented */ - - sec = blk_rq_pos(req); - len = blk_rq_cur_bytes(req); - - /* - * Let's prevent the device from being removed while we're doing I/O - * work. Notice that this means we serialize all the I/O operations, - * but it's probably of no impact given the NAND core serializes - * flash access anyway. - */ - mutex_lock(&dev->dev_mutex); - ret = ubiblock_read(dev, bio_data(req->bio), sec, len); - mutex_unlock(&dev->dev_mutex); - - return ret; -} - -static void ubiblock_do_work(struct work_struct *work) -{ - struct ubiblock *dev = - container_of(work, struct ubiblock, work); - struct request_queue *rq = dev->rq; - struct request *req; - int res; - - spin_lock_irq(rq->queue_lock); - - req = blk_fetch_request(rq); - while (req) { - - spin_unlock_irq(rq->queue_lock); - res = do_ubiblock_request(dev, req); - spin_lock_irq(rq->queue_lock); - - /* - * If we're done with this request, - * we need to fetch a new one - */ - if (!__blk_end_request_cur(req, res)) - req = blk_fetch_request(rq); - } - - spin_unlock_irq(rq->queue_lock); -} - -static void ubiblock_request(struct request_queue *rq) -{ - struct ubiblock *dev; - struct request *req; - - dev = rq->queuedata; - - if (!dev) - while ((req = blk_fetch_request(rq)) != NULL) - __blk_end_request_all(req, -ENODEV); - else - queue_work(dev->wq, &dev->work); -} - static int ubiblock_open(struct block_device *bdev, fmode_t mode) { struct ubiblock *dev = bdev->bd_disk->private_data; @@ -374,6 +294,57 @@ static const struct block_device_operations ubiblock_ops = { .getgeo = ubiblock_getgeo, }; +static void ubiblock_do_work(struct work_struct *work) +{ + int ret; + struct ubiblock_pdu *pdu = container_of(work, struct ubiblock_pdu, work); + struct request *req = blk_mq_rq_from_pdu(pdu); + + blk_mq_start_request(req); + blk_rq_map_sg(req->q, req, pdu->usgl.sg); + + ret = ubiblock_read(pdu); + blk_mq_end_request(req, ret); +} + +static int ubiblock_queue_rq(struct blk_mq_hw_ctx *hctx, + const struct blk_mq_queue_data *bd) +{ + struct request *req = bd->rq; + struct ubiblock *dev = hctx->queue->queuedata; + struct ubiblock_pdu *pdu = blk_mq_rq_to_pdu(req); + + if (req->cmd_type != REQ_TYPE_FS) + return BLK_MQ_RQ_QUEUE_ERROR; + + if (rq_data_dir(req) != READ) + return BLK_MQ_RQ_QUEUE_ERROR; /* Write not implemented */ + + ubi_sgl_init(&pdu->usgl); + queue_work(dev->wq, &pdu->work); + + return BLK_MQ_RQ_QUEUE_OK; +} + +static int ubiblock_init_request(void *data, struct request *req, + unsigned int hctx_idx, + unsigned int request_idx, + unsigned int numa_node) +{ + struct ubiblock_pdu *pdu = blk_mq_rq_to_pdu(req); + + sg_init_table(pdu->usgl.sg, UBI_MAX_SG_COUNT); + INIT_WORK(&pdu->work, ubiblock_do_work); + + return 0; +} + +static struct blk_mq_ops ubiblock_mq_ops = { + .queue_rq = ubiblock_queue_rq, + .init_request = ubiblock_init_request, + .map_queue = blk_mq_map_queue, +}; + int ubiblock_create(struct ubi_volume_info *vi) { struct ubiblock *dev; @@ -417,13 +388,27 @@ int ubiblock_create(struct ubi_volume_info *vi) set_capacity(gd, disk_capacity); dev->gd = gd; - spin_lock_init(&dev->queue_lock); - dev->rq = blk_init_queue(ubiblock_request, &dev->queue_lock); + dev->tag_set.ops = &ubiblock_mq_ops; + dev->tag_set.queue_depth = 64; + dev->tag_set.numa_node = NUMA_NO_NODE; + dev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE; + dev->tag_set.cmd_size = sizeof(struct ubiblock_pdu); + dev->tag_set.driver_data = dev; + dev->tag_set.nr_hw_queues = 1; + + ret = blk_mq_alloc_tag_set(&dev->tag_set); + if (ret) { + dev_err(disk_to_dev(dev->gd), "blk_mq_alloc_tag_set failed"); + goto out_put_disk; + } + + dev->rq = blk_mq_init_queue(&dev->tag_set); if (!dev->rq) { - dev_err(disk_to_dev(gd), "blk_init_queue failed"); + dev_err(disk_to_dev(gd), "blk_mq_init_queue failed"); ret = -ENODEV; - goto out_put_disk; + goto out_free_tags; } + blk_queue_max_segments(dev->rq, UBI_MAX_SG_COUNT); dev->rq->queuedata = dev; dev->gd->queue = dev->rq; @@ -437,7 +422,6 @@ int ubiblock_create(struct ubi_volume_info *vi) ret = -ENOMEM; goto out_free_queue; } - INIT_WORK(&dev->work, ubiblock_do_work); mutex_lock(&devices_mutex); list_add_tail(&dev->list, &ubiblock_devices); @@ -451,6 +435,8 @@ int ubiblock_create(struct ubi_volume_info *vi) out_free_queue: blk_cleanup_queue(dev->rq); +out_free_tags: + blk_mq_free_tag_set(&dev->tag_set); out_put_disk: put_disk(dev->gd); out_free_dev: @@ -461,8 +447,13 @@ out_free_dev: static void ubiblock_cleanup(struct ubiblock *dev) { + /* Stop new requests to arrive */ del_gendisk(dev->gd); + /* Flush pending work */ + destroy_workqueue(dev->wq); + /* Finally destroy the blk queue */ blk_cleanup_queue(dev->rq); + blk_mq_free_tag_set(&dev->tag_set); dev_info(disk_to_dev(dev->gd), "released"); put_disk(dev->gd); } @@ -490,9 +481,6 @@ int ubiblock_remove(struct ubi_volume_info *vi) list_del(&dev->list); mutex_unlock(&devices_mutex); - /* Flush pending work and stop this workqueue */ - destroy_workqueue(dev->wq); - ubiblock_cleanup(dev); mutex_unlock(&dev->dev_mutex); kfree(dev); @@ -620,8 +608,6 @@ static void ubiblock_remove_all(void) struct ubiblock *dev; list_for_each_entry_safe(dev, next, &ubiblock_devices, list) { - /* Flush pending work and stop workqueue */ - destroy_workqueue(dev->wq); /* The module is being forcefully removed */ WARN_ON(dev->desc); /* Remove from device list */