[0/2] RISC-V: ifunced memcpy using new kernel hwprobe interface

Message ID	20230206194819.1679472-1-evan@rivosinc.com
Headers	show Return-Path: <libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CB6D33858C20 From: Evan Green <evan@rivosinc.com> To: libc-alpha@sourceware.org Cc: slewis@rivosinc.com, vineetg@rivosinc.com, palmer@rivosinc.com, Evan Green <evan@rivosinc.com> Subject: [PATCH 0/2] RISC-V: ifunced memcpy using new kernel hwprobe interface Date: Mon, 6 Feb 2023 11:48:17 -0800 Message-Id: <20230206194819.1679472-1-evan@rivosinc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org>
Series	RISC-V: ifunced memcpy using new kernel hwprobe interface \| expand [0/2] RISC-V: ifunced memcpy using new kernel hwprobe interface [1/2] riscv: Add Linux hwprobe syscall support [2/2] riscv: Add and use alignment-ignorant memcpy

Message ID

20230206194819.1679472-1-evan@rivosinc.com

Headers

DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CB6D33858C20
From: Evan Green <evan@rivosinc.com>
To: libc-alpha@sourceware.org
Cc: slewis@rivosinc.com, vineetg@rivosinc.com, palmer@rivosinc.com,
 Evan Green <evan@rivosinc.com>
Subject: [PATCH 0/2] RISC-V: ifunced memcpy using new kernel hwprobe interface
Date: Mon,  6 Feb 2023 11:48:17 -0800
Message-Id: <20230206194819.1679472-1-evan@rivosinc.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: list
Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org
Sender: "Libc-alpha"
 <libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org>

Series

RISC-V: ifunced memcpy using new kernel hwprobe interface | expand

Message

Evan Green Feb. 6, 2023, 7:48 p.m. UTC

This series illustrates the use of a proposed Linux syscall that
enumerates architectural information about the RISC-V cores the system
is running on. In this series we expose a small wrapper function around
the syscall. An ifunc selector for memcpy queries it to see if unaligned
access is "fast" on this hardware. If it is, it selects a newly provided
implementation of memcpy that doesn't work hard at aligning the src and
destination buffers.

This is somewhat of a proof of concept for the syscall itself, but I do
find that in my goofy  memcpy test [1], the unaligned memcpy performed at
least as well as the generic C version. This is however on Qemu on an M1
mac, so not a test of any real hardware (more a smoke test that the
implementation isn't silly).

v1 of the Linux series can be found at [2]. I'm about to post v2 (but
haven't yet!), I can reply here with the link once v2 is posted.

[1] https://pastebin.com/Nj8ixpkX
[2] https://yhbt.net/lore/all/20221013163551.6775-1-palmer@rivosinc.com/


Evan Green (2):
  riscv: Add Linux hwprobe syscall support
  riscv: Add and use alignment-ignorant memcpy

 sysdeps/riscv/memcopy.h                       |  28 +++++
 sysdeps/riscv/memcpy.c                        |  65 +++++++++++
 sysdeps/riscv/memcpy_noalignment.S            | 103 ++++++++++++++++++
 sysdeps/unix/sysv/linux/riscv/Makefile        |   8 +-
 sysdeps/unix/sysv/linux/riscv/Versions        |   3 +
 sysdeps/unix/sysv/linux/riscv/hwprobe.c       |  30 +++++
 .../unix/sysv/linux/riscv/memcpy-generic.c    |  24 ++++
 .../unix/sysv/linux/riscv/rv32/arch-syscall.h |   1 +
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv64/arch-syscall.h |   1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h   |  34 ++++++
 sysdeps/unix/sysv/linux/syscall-names.list    |   1 +
 13 files changed, 298 insertions(+), 2 deletions(-)
 create mode 100644 sysdeps/riscv/memcopy.h
 create mode 100644 sysdeps/riscv/memcpy.c
 create mode 100644 sysdeps/riscv/memcpy_noalignment.S
 create mode 100644 sysdeps/unix/sysv/linux/riscv/hwprobe.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h

Comments

Richard Henderson Feb. 6, 2023, 9:28 p.m. UTC | #1

On 2/6/23 09:48, Evan Green wrote:
> 
> This series illustrates the use of a proposed Linux syscall that
> enumerates architectural information about the RISC-V cores the system
> is running on. In this series we expose a small wrapper function around
> the syscall. An ifunc selector for memcpy queries it to see if unaligned
> access is "fast" on this hardware. If it is, it selects a newly provided
> implementation of memcpy that doesn't work hard at aligning the src and
> destination buffers.
> 
> This is somewhat of a proof of concept for the syscall itself, but I do
> find that in my goofy  memcpy test [1], the unaligned memcpy performed at
> least as well as the generic C version. This is however on Qemu on an M1
> mac, so not a test of any real hardware (more a smoke test that the
> implementation isn't silly).
> 
> v1 of the Linux series can be found at [2]. I'm about to post v2 (but
> haven't yet!), I can reply here with the link once v2 is posted.
> 
> [1] https://pastebin.com/Nj8ixpkX
> [2] https://yhbt.net/lore/all/20221013163551.6775-1-palmer@rivosinc.com/

Re the syscall:

I question whether the heterogenous cpu case is something that you really want to query. 
In order to handle migration between such cpus, any such query must return the minimum 
level of support.

Remove that possibility, and this becomes a simple array reference.  Now you need to 
decide whether a vdso call, or HWCAP2 as pointer to read-only data is more or less 
efficient or extensible.


r~

Adhemerval Zanella Feb. 7, 2023, 12:49 p.m. UTC | #2

On 06/02/23 18:28, Richard Henderson via Libc-alpha wrote:
> On 2/6/23 09:48, Evan Green wrote:
>>
>> This series illustrates the use of a proposed Linux syscall that
>> enumerates architectural information about the RISC-V cores the system
>> is running on. In this series we expose a small wrapper function around
>> the syscall. An ifunc selector for memcpy queries it to see if unaligned
>> access is "fast" on this hardware. If it is, it selects a newly provided
>> implementation of memcpy that doesn't work hard at aligning the src and
>> destination buffers.
>>
>> This is somewhat of a proof of concept for the syscall itself, but I do
>> find that in my goofy  memcpy test [1], the unaligned memcpy performed at
>> least as well as the generic C version. This is however on Qemu on an M1
>> mac, so not a test of any real hardware (more a smoke test that the
>> implementation isn't silly).
>>
>> v1 of the Linux series can be found at [2]. I'm about to post v2 (but
>> haven't yet!), I can reply here with the link once v2 is posted.
>>
>> [1] https://pastebin.com/Nj8ixpkX
>> [2] https://yhbt.net/lore/all/20221013163551.6775-1-palmer@rivosinc.com/
> 
> Re the syscall:
> 
> I question whether the heterogenous cpu case is something that you really want to query. In order to handle migration between such cpus, any such query must return the minimum level of support.
> 
> Remove that possibility, and this becomes a simple array reference.  Now you need to decide whether a vdso call, or HWCAP2 as pointer to read-only data is more or less efficient or extensible.

It should at least work if kernel trap/emulate unaligned or any instruction
not supported by the other code, although it would be really subpar.  I 
would expect that kernel would report the minimum ISA as well.

I would recommend also to cache the values as we do for aarch64/x86/powerpc
to avoid issue multiple syscall on symbol resolution (check cpu-features.c).