mbox series

[V4,0/3] scsi: core: avoid big pre-allocation for sg list

Message ID 20190428073932.9898-1-ming.lei@redhat.com
Headers show
Series scsi: core: avoid big pre-allocation for sg list | expand

Message

Ming Lei April 28, 2019, 7:39 a.m. UTC
Hi,

Since supporting to blk-mq, big pre-allocation for sg list is introduced,
this way is very unfriendly wrt. memory consumption.

There were Red Hat internal reports that some scsi_debug based tests
can't be run any more because of too big pre-allocation.

Also lpfc users commplained that 1GB+ ram is pre-allocatd for single
HBA.

sg_alloc_table_chained() is improved to support variant size of 1st
pre-allocated SGL in the 1st patch as suggested by Christoph.

The other two patches try to address this issue by allocating sg list runtime,
meantime pre-allocating one or two inline sg entries for small IO. This
ways follows NVMe's approach wrt. sg list allocation.

V4:
	- add parameter to sg_alloc_table_chained()/sg_free_table_chained()
	directly, and update current callers

V3:
	- improve sg_alloc_table_chained() to accept variant size of
	the 1st pre-allocated SGL
	- applies the improved sg API to address the big pre-allocation
	issue

V2:
	- move inline sg table initializetion into one helper
	- introduce new helper for getting inline sg
	- comment log fix


Ming Lei (3):
  lib/sg_pool.c: improve APIs for allocating sg pool
  scsi: core: avoid to pre-allocate big chunk for protection meta data
  scsi: core: avoid to pre-allocate big chunk for sg list

 drivers/nvme/host/fc.c            |  7 ++++---
 drivers/nvme/host/rdma.c          |  7 ++++---
 drivers/nvme/target/loop.c        |  4 ++--
 drivers/scsi/scsi_lib.c           | 31 ++++++++++++++++++++++---------
 include/linux/scatterlist.h       | 11 +++++++----
 lib/scatterlist.c                 | 36 +++++++++++++++++++++++-------------
 lib/sg_pool.c                     | 37 +++++++++++++++++++++++++++----------
 net/sunrpc/xprtrdma/svc_rdma_rw.c |  5 +++--
 8 files changed, 92 insertions(+), 46 deletions(-)

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: netdev@vger.kernel.org
Cc: linux-nvme@lists.infradead.org

Comments

Ming Lei May 5, 2019, 1:10 a.m. UTC | #1
On Sun, Apr 28, 2019 at 03:39:29PM +0800, Ming Lei wrote:
> Hi,
> 
> Since supporting to blk-mq, big pre-allocation for sg list is introduced,
> this way is very unfriendly wrt. memory consumption.
> 
> There were Red Hat internal reports that some scsi_debug based tests
> can't be run any more because of too big pre-allocation.
> 
> Also lpfc users commplained that 1GB+ ram is pre-allocatd for single
> HBA.
> 
> sg_alloc_table_chained() is improved to support variant size of 1st
> pre-allocated SGL in the 1st patch as suggested by Christoph.
> 
> The other two patches try to address this issue by allocating sg list runtime,
> meantime pre-allocating one or two inline sg entries for small IO. This
> ways follows NVMe's approach wrt. sg list allocation.
> 
> V4:
> 	- add parameter to sg_alloc_table_chained()/sg_free_table_chained()
> 	directly, and update current callers
> 
> V3:
> 	- improve sg_alloc_table_chained() to accept variant size of
> 	the 1st pre-allocated SGL
> 	- applies the improved sg API to address the big pre-allocation
> 	issue
> 
> V2:
> 	- move inline sg table initializetion into one helper
> 	- introduce new helper for getting inline sg
> 	- comment log fix
> 
> 
> Ming Lei (3):
>   lib/sg_pool.c: improve APIs for allocating sg pool
>   scsi: core: avoid to pre-allocate big chunk for protection meta data
>   scsi: core: avoid to pre-allocate big chunk for sg list
> 
>  drivers/nvme/host/fc.c            |  7 ++++---
>  drivers/nvme/host/rdma.c          |  7 ++++---
>  drivers/nvme/target/loop.c        |  4 ++--
>  drivers/scsi/scsi_lib.c           | 31 ++++++++++++++++++++++---------
>  include/linux/scatterlist.h       | 11 +++++++----
>  lib/scatterlist.c                 | 36 +++++++++++++++++++++++-------------
>  lib/sg_pool.c                     | 37 +++++++++++++++++++++++++++----------
>  net/sunrpc/xprtrdma/svc_rdma_rw.c |  5 +++--
>  8 files changed, 92 insertions(+), 46 deletions(-)
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Ewan D. Milne <emilne@redhat.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Chuck Lever <chuck.lever@oracle.com>
> Cc: netdev@vger.kernel.org
> Cc: linux-nvme@lists.infradead.org

Hi Martin,

Could you consider to merge this patchset to 5.2 if you are fine?


Thanks,
Ming
Martin K. Petersen May 14, 2019, 2:06 a.m. UTC | #2
Ming,

> Since supporting to blk-mq, big pre-allocation for sg list is
> introduced, this way is very unfriendly wrt. memory consumption.

Applied to 5.3/scsi-queue with some clarifications to the commit
descriptions.

I am not entirely sold on 1 for the inline protection SGL size. NVMe
over PCIe is pretty constrained thanks to the metadata pointer whereas
SCSI DIX uses a real SGL for the PI. Consequently, straddling a page is
not that uncommon for large, sequential I/Os.

But let's try it out. If performance suffers substantially, we may want
to bump it to 2.