[4/4] S390: Implement mempcpy with help of memcpy. [BZ #19765]

On 05/12/2016 04:10 PM, Adhemerval Zanella wrote:
>
>
> On 09/05/2016 11:15, Stefan Liebler wrote:
>> On 05/05/2016 06:36 PM, H.J. Lu wrote:
>>> On Thu, May 5, 2016 at 9:34 AM, Adhemerval Zanella
>>> <adhemerval.zanella@linaro.org> wrote:
>>>>
>>>>
>>>>> On May 5, 2016, at 11:45, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>
>>>>> On Thu, May 5, 2016 at 7:15 AM, Adhemerval Zanella
>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>
>>>>>>
>>>>>>> On 05/05/2016 10:37, H.J. Lu wrote:
>>>>>>> On Wed, May 4, 2016 at 1:58 PM, Adhemerval Zanella
>>>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 04/05/2016 17:51, Wilco Dijkstra wrote:
>>>>>>>>> Adhemerval Zanella wrote:
>>>>>>>>>>
>>>>>>>>>> But my point is all the architectures which provide an optimized mempcpy is
>>>>>>>>>> though either 1. jump directly to optimized memcpy (s390 case for this patchset),
>>>>>>>>>> 2. clonning the same memcpy implementation and adjusting the pointers (x86_64) or
>>>>>>>>>> 3. using a similar strategy for both implementations (powerpc).
>>>>>>>>>
>>>>>>>>> Indeed, which of those are used doesn't matter much.
>>>>>>>>>
>>>>>>>>>> So for this change I am proposing compiler support won't be required because both
>>>>>>>>>> memcpy and __mempcpy will be transformed to memcpy + s.  Based on assumption that
>>>>>>>>>> memcpy is fast as mempcpy implementation I think there is no need to just add
>>>>>>>>>> this micro-optimization to only s390, but rather make is general.
>>>>>>>>>
>>>>>>>>> GLIBC already has this optimization in the generic string header, it's just that s390 wants
>>>>>>>>> to do something different again. As long as GCC isn't fixed this isn't possible to support
>>>>>>>>> s390 without this header workaround. And we need GCC to improve so things work
>>>>>>>>> better for all the other C libraries...
>>>>>>>>
>>>>>>>> But the current one at string/string.h is only enabled with !defined _HAVE_STRING_ARCH_mempcpy,
>>>>>>>> so if a port actually adds a mempcpy one it won't be enabled.  What I am trying to argue it
>>>>>>>> to just remove the !defined _HAVE_STRING_ARCH_mempcpy and enable it as default for all
>>>>>>>> ports.
>>>>>>>
>>>>>>> Please don't enable it for x86.  Calling memcpy means we have to
>>>>>>> save and restore 2 registers for no good reasons.
>>>>>>
>>>>>> Yes, direct call will require save and restore the size for further add
>>>>>> and this is true for most architectures.  My question is if does this
>>>>>> really matter in currently GLIBC internal usage and on programs that
>>>>>> might use it compared against the burden of keeping the various
>>>>>> string*.h header in check for multiple architectures or adding this
>>>>>> logic (mempcpy transformation to memcpy) on compiler.
>>>>>
>>>>> What burden? There is nothing to do in glibc for x86.  GCC can
>>>>> inline mempcpy for x86.
>>>>
>>>> In fact I am objecting all the bits GLIBC added on string*.h that only adds complexity for some micro-optimizations. For x86 I do agree that transforming mempcpy to memcpy is no the best strategy.
>>>>
>>>> My rationale is to avoid add even more arch-specific bits in installed headers to add such optimizations.
>>>
>>> I believe most of those micro-optimizations belong to GCC, not glibc.
>>> Of course, we should keep the existing ones for older GCCs.  We
>>> should avoid adding new ones.
>>>
>>
>> Does this mean, the proposed way is to not add a macro for mempcpy in sysdeps/s390/bits/string.h or e.g. for another architecture?
>>
>> If _HAVE_STRING_ARCH_mempcpy is not defined for s390, then it will always use the macro defined in string/string.h, which inlines memcpy + n and the mempcpy-function is not called. The memcpy is transformed to mvc, mvhi for e.g. constant lengths. Otherwise, the memcpy-function is called and the length will be added after the call.
>>
>> According to PR70140 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70140), the macro in string/string.h will be removed after this bug is fixed in GCC?
>> Then I think the decision, if mempcpy or memcpy is called or an inline version is emitted, will be done per architecture backend?
>>
>> If this is all true, then the mempcpy-function will be called in future if it makes sense!?
>>
>>
>> If _HAVE_STRING_ARCH_mempcpy will be defined for s390, then mempcpy-function is always called - even for cases with constant length where mvc or mvhi is emitted.
>> This is definitely not the intention of this patch.
>> After fixing PR70140, GCC will correctly handle the constant length cases!?
>
> What I am proposing is to avoid add more arch-specific optimization on installed
> string*.h headers and instead work on adding them on compiler side.  My view is
> we should cleanup as much as possible the string headers and only add optimization
> that are more architecture neutral.
>
> Related to patch, my understanding is s390x does not really provide an optimized
> mempcpy (it uses the default mempcpy.c) and I think a better approach would be
> to add an optimized mempcpy like x88_64 (c365e615f7429aee302f8af7bf07ae262278febb).
> The mempcpy won't be optimized directly to mvc instruction, but at least it will
> call the optimized memcpy.  The full optimization will be done by correctly
> handling the transformation on compiler size (the PR#70140 as you referred).
>
>

Okay. Here is the updated patch. It implements mempcpy with memcpy as 
before, but does not change the s390-specific string.h and 
_HAVE_STRING_ARCH_mempcpy is not defined. Thus at the moment mempcpy() 
won't be called, but instead memcpy + n is inlined by the mempcpy macro 
in string/string.h. After fxing PR70140, either the mempcpy macro in 
string/string.h has to be removed or _HAVE_STRING_ARCH_mempcpy has to be 
defined for S390.

Okay to commit?

[4/4] S390: Implement mempcpy with help of memcpy. [BZ #19765]

Commit Message

Comments

Patch