diff mbox series

elf: Replace memcmp with __memcmpeq for variable size

Message ID 20220206210914.1593336-1-hjl.tools@gmail.com
State New
Headers show
Series elf: Replace memcmp with __memcmpeq for variable size | expand

Commit Message

H.J. Lu Feb. 6, 2022, 9:09 p.m. UTC
Replace memcmp with __memcmpeq when a boolean return is all that is
needed.  __memcmpeq is an alias of memcmp by default and it can be
optimized by architectures.

Add <memcmp-eq.h> to define memcmp_eq which calls memcmp by default.

On x86-64, memcmp_eq calls __memcmpeq for variable size:

1. ld.so size
   text	   data	    bss	    dec	    hex	filename
 203170	  10984	    456	 214610	  34652	ld.so (before)
 204146	  10984	    456	 215586	  34a22	ld.so (after)
2. Cycles to run "sprof":

Before:

   1593248:
   1593248:	runtime linker statistics:
   1593248:	  total startup time in dynamic loader: 123726 cycles
   1593248:	            time needed for relocation: 36187 cycles (29.2%)
   1593248:	                 number of relocations: 103
   1593248:	      number of relocations from cache: 3
   1593248:	        number of relative relocations: 1355
   1593248:	           time needed to load objects: 34577 cycles (27.9%)
Try `sprof --help' or `sprof --usage' for more information.
   1593248:
   1593248:	runtime linker statistics:
   1593248:	           final number of relocations: 108
   1593248:	final number of relocations from cache: 3

After:

   1593300:
   1593300:	runtime linker statistics:
   1593300:	  total startup time in dynamic loader: 120229 cycles
   1593300:	            time needed for relocation: 33655 cycles (27.9%)
   1593300:	                 number of relocations: 103
   1593300:	      number of relocations from cache: 3
   1593300:	        number of relative relocations: 1355
   1593300:	           time needed to load objects: 32114 cycles (26.7%)
Try `sprof --help' or `sprof --usage' for more information.
   1593300:
   1593300:	runtime linker statistics:
   1593300:	           final number of relocations: 108
   1593300:	final number of relocations from cache: 3
---
 elf/cache.c                              | 20 ++++++------
 elf/dl-cache.c                           | 12 ++++----
 elf/dl-diagnostics.c                     |  2 +-
 elf/dl-hwcaps_split.c                    |  2 +-
 elf/dl-load.c                            | 25 ++++++++-------
 elf/dl-profile.c                         |  6 ++--
 elf/ldconfig.c                           |  5 +--
 elf/pldd.c                               |  2 +-
 elf/readelflib.c                         |  4 +--
 elf/readlib.c                            |  5 +--
 elf/rtld.c                               | 39 ++++++++++++------------
 elf/sprof.c                              | 12 ++++----
 include/string.h                         |  3 ++
 sysdeps/generic/memcmp-eq.h              | 24 +++++++++++++++
 sysdeps/x86_64/memcmp-eq.h               | 26 ++++++++++++++++
 sysdeps/x86_64/multiarch/memcmpeq-sse2.S |  4 ++-
 16 files changed, 127 insertions(+), 64 deletions(-)
 create mode 100644 sysdeps/generic/memcmp-eq.h
 create mode 100644 sysdeps/x86_64/memcmp-eq.h

Comments

Florian Weimer Feb. 6, 2022, 10:19 p.m. UTC | #1
* H. J. Lu via Libc-alpha:

> diff --git a/elf/dl-cache.c b/elf/dl-cache.c
> index 88bf78ad7c..8574d4ded1 100644
> --- a/elf/dl-cache.c
> +++ b/elf/dl-cache.c
> @@ -72,7 +72,7 @@ glibc_hwcaps_compare (uint32_t left_index, struct dl_hwcaps_priority *right)
>      to_compare = left_name_length;
>    else
>      to_compare = right->name_length;
> -  int cmp = memcmp (left_name, right->name, to_compare);
> +  int cmp = memcmp_eq (left_name, right->name, to_compare);
>    if (cmp != 0)
>      return cmp;
>    if (left_name_length < right->name_length)

This change is not correct.

The x86-specific <memcmp-eq.h> optimization probably applies to other
targets as well.  I also do not quite see where the performance
benefits come from.  None of the changed spots look particularly hot
to me.

And I thought the assumption was that GCC would perform this optimization?
H.J. Lu Feb. 7, 2022, 12:20 a.m. UTC | #2
On Sun, Feb 6, 2022 at 2:19 PM Florian Weimer <fw@deneb-ng.enyo.de> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > diff --git a/elf/dl-cache.c b/elf/dl-cache.c
> > index 88bf78ad7c..8574d4ded1 100644
> > --- a/elf/dl-cache.c
> > +++ b/elf/dl-cache.c
> > @@ -72,7 +72,7 @@ glibc_hwcaps_compare (uint32_t left_index, struct dl_hwcaps_priority *right)
> >      to_compare = left_name_length;
> >    else
> >      to_compare = right->name_length;
> > -  int cmp = memcmp (left_name, right->name, to_compare);
> > +  int cmp = memcmp_eq (left_name, right->name, to_compare);
> >    if (cmp != 0)
> >      return cmp;
> >    if (left_name_length < right->name_length)
>
> This change is not correct.

Fixed.

> The x86-specific <memcmp-eq.h> optimization probably applies to other
> targets as well.  I also do not quite see where the performance

It is useful only if __memcmpeq isn't an alias of memcmp.  I certainly
don't mind moving x86-64 <memcmp-eq.h> to generic.

> benefits come from.  None of the changed spots look particularly hot
> to me.

These are noises.  Here are the new data:

The cycles to run "elf/tst-relsort1 --direct" which calls __memcmpeq
24 times in ld.so:

Before:

     62704:
     62704: runtime linker statistics:
     62704:   total startup time in dynamic loader: 130771 cycles
     62704:             time needed for relocation: 32153 cycles (24.5%)
     62704:                  number of relocations: 97
     62704:       number of relocations from cache: 3
     62704:         number of relative relocations: 1347
     62704:            time needed to load objects: 43704 cycles (33.4%)
     62704:
     62704: runtime linker statistics:
     62704:            final number of relocations: 131
     62704: final number of relocations from cache: 3

After:

     62705:
     62705: runtime linker statistics:
     62705:   total startup time in dynamic loader: 117103 cycles
     62705:             time needed for relocation: 28327 cycles (24.1%)
     62705:                  number of relocations: 97
     62705:       number of relocations from cache: 3
     62705:         number of relative relocations: 1347
     62705:            time needed to load objects: 39550 cycles (33.7%)
     62705:
     62705: runtime linker statistics:
     62705:            final number of relocations: 131
     62705: final number of relocations from cache: 3

These numbers change for each run.  __memcmpeq has the lower
cycles.

> And I thought the assumption was that GCC would perform this optimization?

Yes, but it will take a while to implement it in GCC.
Florian Weimer Feb. 7, 2022, 10:40 a.m. UTC | #3
* H. J. Lu via Libc-alpha:

>> benefits come from.  None of the changed spots look particularly hot
>> to me.
>
> These are noises.  Here are the new data:
>
> The cycles to run "elf/tst-relsort1 --direct" which calls __memcmpeq
> 24 times in ld.so:
>
> Before:
>
>      62704:
>      62704: runtime linker statistics:
>      62704:   total startup time in dynamic loader: 130771 cycles
>      62704:             time needed for relocation: 32153 cycles (24.5%)
>      62704:                  number of relocations: 97
>      62704:       number of relocations from cache: 3
>      62704:         number of relative relocations: 1347
>      62704:            time needed to load objects: 43704 cycles (33.4%)
>      62704:
>      62704: runtime linker statistics:
>      62704:            final number of relocations: 131
>      62704: final number of relocations from cache: 3
>
> After:
>
>      62705:
>      62705: runtime linker statistics:
>      62705:   total startup time in dynamic loader: 117103 cycles
>      62705:             time needed for relocation: 28327 cycles (24.1%)
>      62705:                  number of relocations: 97
>      62705:       number of relocations from cache: 3
>      62705:         number of relative relocations: 1347
>      62705:            time needed to load objects: 39550 cycles (33.7%)
>      62705:
>      62705: runtime linker statistics:
>      62705:            final number of relocations: 131
>      62705: final number of relocations from cache: 3
>
> These numbers change for each run.  __memcmpeq has the lower
> cycles.

This must be something else.  According to the numbers, we save ~159
between ~569 cycles per __memcmpeq call.  That is just not realistic.

Thanks,
Florian
H.J. Lu Feb. 7, 2022, 1:13 p.m. UTC | #4
On Mon, Feb 7, 2022 at 2:41 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> >> benefits come from.  None of the changed spots look particularly hot
> >> to me.
> >
> > These are noises.  Here are the new data:
> >
> > The cycles to run "elf/tst-relsort1 --direct" which calls __memcmpeq
> > 24 times in ld.so:
> >
> > Before:
> >
> >      62704:
> >      62704: runtime linker statistics:
> >      62704:   total startup time in dynamic loader: 130771 cycles
> >      62704:             time needed for relocation: 32153 cycles (24.5%)
> >      62704:                  number of relocations: 97
> >      62704:       number of relocations from cache: 3
> >      62704:         number of relative relocations: 1347
> >      62704:            time needed to load objects: 43704 cycles (33.4%)
> >      62704:
> >      62704: runtime linker statistics:
> >      62704:            final number of relocations: 131
> >      62704: final number of relocations from cache: 3
> >
> > After:
> >
> >      62705:
> >      62705: runtime linker statistics:
> >      62705:   total startup time in dynamic loader: 117103 cycles
> >      62705:             time needed for relocation: 28327 cycles (24.1%)
> >      62705:                  number of relocations: 97
> >      62705:       number of relocations from cache: 3
> >      62705:         number of relative relocations: 1347
> >      62705:            time needed to load objects: 39550 cycles (33.7%)
> >      62705:
> >      62705: runtime linker statistics:
> >      62705:            final number of relocations: 131
> >      62705: final number of relocations from cache: 3
> >
> > These numbers change for each run.  __memcmpeq has the lower
> > cycles.
>
> This must be something else.  According to the numbers, we save ~159
> between ~569 cycles per __memcmpeq call.  That is just not realistic.

I removed the cycle info from the commit log.
Florian Weimer Feb. 7, 2022, 1:19 p.m. UTC | #5
* H. J. Lu:

>> This must be something else.  According to the numbers, we save ~159
>> between ~569 cycles per __memcmpeq call.  That is just not realistic.
>
> I removed the cycle info from the commit log.

If we can't show the change is beneficial, is it worth the additional
1000 or so bytes in the loader text?

Thanks,
Florian
H.J. Lu Feb. 7, 2022, 1:27 p.m. UTC | #6
On Mon, Feb 7, 2022 at 5:19 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> >> This must be something else.  According to the numbers, we save ~159
> >> between ~569 cycles per __memcmpeq call.  That is just not realistic.
> >
> > I removed the cycle info from the commit log.
>
> If we can't show the change is beneficial, is it worth the additional
> 1000 or so bytes in the loader text?
>

$ LD_DEBUG=statistics elf/tst-relsort1 --direct

does show improvements when __memcmpeq is called 24 times.
But the cycle number changes for each run.  The overall trend is
faster.  It will be more obvious when memcmp is called more often.
Adhemerval Zanella Netto Feb. 7, 2022, 1:30 p.m. UTC | #7
On 07/02/2022 10:27, H.J. Lu via Libc-alpha wrote:
> On Mon, Feb 7, 2022 at 5:19 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu:
>>
>>>> This must be something else.  According to the numbers, we save ~159
>>>> between ~569 cycles per __memcmpeq call.  That is just not realistic.
>>>
>>> I removed the cycle info from the commit log.
>>
>> If we can't show the change is beneficial, is it worth the additional
>> 1000 or so bytes in the loader text?
>>
> 
> $ LD_DEBUG=statistics elf/tst-relsort1 --direct
> 
> does show improvements when __memcmpeq is called 24 times.
> But the cycle number changes for each run.  The overall trend is
> faster.  It will be more obvious when memcmp is called more often.
> 

My understanding is this optimization would eventually be implemented by
the compiler, so maybe it would be better to let it optimize if suitable
(similar to what we are aiming for math code).
H.J. Lu Feb. 7, 2022, 2 p.m. UTC | #8
On Mon, Feb 7, 2022 at 5:30 AM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 07/02/2022 10:27, H.J. Lu via Libc-alpha wrote:
> > On Mon, Feb 7, 2022 at 5:19 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * H. J. Lu:
> >>
> >>>> This must be something else.  According to the numbers, we save ~159
> >>>> between ~569 cycles per __memcmpeq call.  That is just not realistic.
> >>>
> >>> I removed the cycle info from the commit log.
> >>
> >> If we can't show the change is beneficial, is it worth the additional
> >> 1000 or so bytes in the loader text?
> >>
> >
> > $ LD_DEBUG=statistics elf/tst-relsort1 --direct
> >
> > does show improvements when __memcmpeq is called 24 times.
> > But the cycle number changes for each run.  The overall trend is
> > faster.  It will be more obvious when memcmp is called more often.
> >
>
> My understanding is this optimization would eventually be implemented by

What kinds of codes should compilers generate?  For glibc internal
usage, we can write codes in such a way that the generated codes
are very similar to what compilers should generate.

> the compiler, so maybe it would be better to let it optimize if suitable
> (similar to what we are aiming for math code).
Joseph Myers Feb. 7, 2022, 8:35 p.m. UTC | #9
On Sun, 6 Feb 2022, H.J. Lu via Libc-alpha wrote:

> Replace memcmp with __memcmpeq when a boolean return is all that is
> needed.  __memcmpeq is an alias of memcmp by default and it can be
> optimized by architectures.

I don't think we should make this sort of change to glibc source code at 
all.  We should write the code to be clear for readers - meaning using the 
standard C function name, which is memcmp - and leave the compiler to 
optimize or not.

In general we should move away from writing such internal function calls 
directly in the source code, where it can reasonably call the standard 
function names instead.  Cf. how we handle things for various internal 
calls in libm: call standard names the compiler might inline (don't 
provide e.g. inline versions of those functions in glibc's internal 
headers), with asm redirection to reserved names to cover the case when 
those function calls don't get inlined.
diff mbox series

Patch

diff --git a/elf/cache.c b/elf/cache.c
index dbf4c83a7a..810028b1bc 100644
--- a/elf/cache.c
+++ b/elf/cache.c
@@ -340,14 +340,15 @@  print_cache (const char *cache_name)
   const char *cache_data;
   int format = 0;
 
-  if (memcmp (cache->magic, CACHEMAGIC, sizeof CACHEMAGIC - 1))
+  if (memcmp_eq (cache->magic, CACHEMAGIC, sizeof CACHEMAGIC - 1))
     {
       /* This can only be the new format without the old one.  */
       cache_new = (struct cache_file_new *) cache;
 
-      if (memcmp (cache_new->magic, CACHEMAGIC_NEW, sizeof CACHEMAGIC_NEW - 1)
-	  || memcmp (cache_new->version, CACHE_VERSION,
-		      sizeof CACHE_VERSION - 1))
+      if (memcmp_eq (cache_new->magic, CACHEMAGIC_NEW,
+		     sizeof CACHEMAGIC_NEW - 1)
+	  || memcmp_eq (cache_new->version, CACHE_VERSION,
+			sizeof CACHE_VERSION - 1))
 	error (EXIT_FAILURE, 0, _("File is not a cache file.\n"));
       check_new_cache (cache_new);
       format = 1;
@@ -374,10 +375,10 @@  print_cache (const char *cache_name)
 
 	  cache_new = (struct cache_file_new *) ((void *)cache + offset);
 
-	  if (memcmp (cache_new->magic, CACHEMAGIC_NEW,
-		      sizeof CACHEMAGIC_NEW - 1) == 0
-	      && memcmp (cache_new->version, CACHE_VERSION,
-			 sizeof CACHE_VERSION - 1) == 0)
+	  if (memcmp_eq (cache_new->magic, CACHEMAGIC_NEW,
+			 sizeof CACHEMAGIC_NEW - 1) == 0
+	      && memcmp_eq (cache_new->version, CACHE_VERSION,
+			    sizeof CACHE_VERSION - 1) == 0)
 	    {
 	      check_new_cache (cache_new);
 	      cache_data = (const char *) cache_new;
@@ -1030,7 +1031,8 @@  load_aux_cache (const char *aux_cache_name)
     = mmap (NULL, aux_cache_size, PROT_READ, MAP_PRIVATE, fd, 0);
   if (aux_cache == MAP_FAILED
       || aux_cache_size < sizeof (struct aux_cache_file)
-      || memcmp (aux_cache->magic, AUX_CACHEMAGIC, sizeof AUX_CACHEMAGIC - 1)
+      || memcmp_eq (aux_cache->magic, AUX_CACHEMAGIC,
+		    sizeof AUX_CACHEMAGIC - 1)
       || aux_cache_size != (sizeof (struct aux_cache_file)
 			    + aux_cache->nlibs * sizeof (struct aux_cache_file_entry)
 			    + aux_cache->len_strings))
diff --git a/elf/dl-cache.c b/elf/dl-cache.c
index 88bf78ad7c..8574d4ded1 100644
--- a/elf/dl-cache.c
+++ b/elf/dl-cache.c
@@ -72,7 +72,7 @@  glibc_hwcaps_compare (uint32_t left_index, struct dl_hwcaps_priority *right)
     to_compare = left_name_length;
   else
     to_compare = right->name_length;
-  int cmp = memcmp (left_name, right->name, to_compare);
+  int cmp = memcmp_eq (left_name, right->name, to_compare);
   if (cmp != 0)
     return cmp;
   if (left_name_length < right->name_length)
@@ -420,8 +420,8 @@  _dl_load_cache_lookup (const char *name)
 	 - the old format with the new format in it
 	 The following checks if the cache contains any of these formats.  */
       if (file != MAP_FAILED && cachesize > sizeof *cache_new
-	  && memcmp (file, CACHEMAGIC_VERSION_NEW,
-		     sizeof CACHEMAGIC_VERSION_NEW - 1) == 0
+	  && memcmp_eq (file, CACHEMAGIC_VERSION_NEW,
+			sizeof CACHEMAGIC_VERSION_NEW - 1) == 0
 	  /* Check for corruption, avoiding overflow.  */
 	  && ((cachesize - sizeof *cache_new) / sizeof (struct file_entry_new)
 	      >= ((struct cache_file_new *) file)->nlibs))
@@ -435,7 +435,7 @@  _dl_load_cache_lookup (const char *name)
 	  cache = file;
 	}
       else if (file != MAP_FAILED && cachesize > sizeof *cache
-	       && memcmp (file, CACHEMAGIC, sizeof CACHEMAGIC - 1) == 0
+	       && memcmp_eq (file, CACHEMAGIC, sizeof CACHEMAGIC - 1) == 0
 	       /* Check for corruption, avoiding overflow.  */
 	       && ((cachesize - sizeof *cache) / sizeof (struct file_entry)
 		   >= ((struct cache_file *) file)->nlibs))
@@ -450,8 +450,8 @@  _dl_load_cache_lookup (const char *name)
 
 	  cache_new = (struct cache_file_new *) ((void *) cache + offset);
 	  if (cachesize < (offset + sizeof (struct cache_file_new))
-	      || memcmp (cache_new->magic, CACHEMAGIC_VERSION_NEW,
-			 sizeof CACHEMAGIC_VERSION_NEW - 1) != 0)
+	      || memcmp_eq (cache_new->magic, CACHEMAGIC_VERSION_NEW,
+			    sizeof CACHEMAGIC_VERSION_NEW - 1) != 0)
 	      cache_new = (void *) -1;
 	  else
 	    {
diff --git a/elf/dl-diagnostics.c b/elf/dl-diagnostics.c
index d29bdd6904..3181ef5ea8 100644
--- a/elf/dl-diagnostics.c
+++ b/elf/dl-diagnostics.c
@@ -170,7 +170,7 @@  unfiltered_envvar (const char *env, size_t *name_length)
     {
       size_t candidate_length = strlen (candidate);
       if (candidate_length == envname_length
-          && memcmp (candidate, env, candidate_length) == 0)
+          && memcmp_eq (candidate, env, candidate_length) == 0)
         return true;
       candidate += candidate_length + 1;
     }
diff --git a/elf/dl-hwcaps_split.c b/elf/dl-hwcaps_split.c
index 8f5d7a22b2..d69825bbcf 100644
--- a/elf/dl-hwcaps_split.c
+++ b/elf/dl-hwcaps_split.c
@@ -71,7 +71,7 @@  _dl_hwcaps_contains (const char *hwcaps, const char *name, size_t name_length)
   _dl_hwcaps_split_init (&split, hwcaps);
   while (_dl_hwcaps_split (&split))
     if (split.length == name_length
-        && memcmp (split.segment, name, name_length) == 0)
+        && memcmp_eq (split.segment, name, name_length) == 0)
       return true;
   return false;
 }
diff --git a/elf/dl-load.c b/elf/dl-load.c
index 5b0ff41ee1..dd50f4457e 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -170,7 +170,7 @@  is_trusted_path_normalize (const char *path, size_t len)
   for (size_t idx = 0; idx < nsystem_dirs_len; ++idx)
     {
       if (wnp - npath >= system_dirs_len[idx]
-	  && memcmp (trun, npath, system_dirs_len[idx]) == 0)
+	  && memcmp_eq (trun, npath, system_dirs_len[idx]) == 0)
 	/* Found it.  */
 	return true;
 
@@ -507,7 +507,8 @@  fillin_rpath (char *rpath, struct r_search_path_elem **result, const char *sep,
 
       /* See if this directory is already known.  */
       for (dirp = GL(dl_all_dirs); dirp != NULL; dirp = dirp->next)
-	if (dirp->dirnamelen == len && memcmp (cp, dirp->dirname, len) == 0)
+	if (dirp->dirnamelen == len
+	    && memcmp_eq (cp, dirp->dirname, len) == 0)
 	  break;
 
       if (dirp != NULL)
@@ -884,7 +885,7 @@  _dl_process_pt_gnu_property (struct link_map *l, int fd, const ElfW(Phdr) *ph)
       /* Find the NT_GNU_PROPERTY_TYPE_0 note.  */
       if (note->n_namesz == 4
 	  && note->n_type == NT_GNU_PROPERTY_TYPE_0
-	  && memcmp (note + 1, "GNU", 4) == 0)
+	  && memcmp_eq (note + 1, "GNU", 4) == 0)
 	{
 	  /* Check for invalid property.  */
 	  if (note->n_descsz < 8
@@ -1573,7 +1574,7 @@  open_verify (const char *name, int fd,
 #define ELF32_CLASS ELFCLASS32
 #define ELF64_CLASS ELFCLASS64
 #ifndef VALID_ELF_HEADER
-# define VALID_ELF_HEADER(hdr,exp,size)	(memcmp (hdr, exp, size) == 0)
+# define VALID_ELF_HEADER(hdr,exp,size)	(memcmp_eq (hdr, exp, size) == 0)
 # define VALID_ELF_OSABI(osabi)		(osabi == ELFOSABI_SYSV)
 # define VALID_ELF_ABIVERSION(osabi,ver) (ver == 0)
 #elif defined MORE_ELF_HEADER_DATA
@@ -1675,9 +1676,9 @@  open_verify (const char *name, int fd,
 						EI_ABIVERSION)
 			    || !VALID_ELF_ABIVERSION (ehdr->e_ident[EI_OSABI],
 						      ehdr->e_ident[EI_ABIVERSION])
-			    || memcmp (&ehdr->e_ident[EI_PAD],
-				       &expected[EI_PAD],
-				       EI_NIDENT - EI_PAD) != 0))
+			    || memcmp_eq (&ehdr->e_ident[EI_PAD],
+					  &expected[EI_PAD],
+					  EI_NIDENT - EI_PAD) != 0))
 	{
 	  /* Something is wrong.  */
 	  const Elf32_Word *magp = (const void *) ehdr->e_ident;
@@ -1720,8 +1721,8 @@  open_verify (const char *name, int fd,
 	  else if (!VALID_ELF_ABIVERSION (ehdr->e_ident[EI_OSABI],
 					  ehdr->e_ident[EI_ABIVERSION]))
 	    errstring = N_("ELF file ABI version invalid");
-	  else if (memcmp (&ehdr->e_ident[EI_PAD], &expected[EI_PAD],
-			   EI_NIDENT - EI_PAD) != 0)
+	  else if (memcmp_eq (&ehdr->e_ident[EI_PAD], &expected[EI_PAD],
+			       EI_NIDENT - EI_PAD) != 0)
 	    errstring = N_("nonzero padding in e_ident");
 	  else
 	    /* Otherwise we don't know what went wrong.  */
@@ -1802,7 +1803,8 @@  open_verify (const char *name, int fd,
 		  }
 	      }
 
-	    while (memcmp (abi_note, &expected_note, sizeof (expected_note)))
+	    while (memcmp_eq (abi_note, &expected_note,
+			      sizeof (expected_note)))
 	      {
 		ElfW(Addr) note_size
 		  = ELF_NOTE_NEXT_OFFSET (abi_note[0], abi_note[1],
@@ -2193,7 +2195,8 @@  _dl_map_object (struct link_map *loader, const char *name,
 
 		  do
 		    {
-		      if (memcmp (cached, dirp, system_dirs_len[cnt]) == 0)
+		      if (memcmp_eq (cached, dirp,
+				     system_dirs_len[cnt]) == 0)
 			{
 			  /* The prefix matches.  Don't use the entry.  */
 			  free (cached);
diff --git a/elf/dl-profile.c b/elf/dl-profile.c
index 9359be7c33..7bfe558e66 100644
--- a/elf/dl-profile.c
+++ b/elf/dl-profile.c
@@ -416,10 +416,10 @@  _dl_start_profile (void)
   else
     {
       /* Test the signature in the file.  */
-      if (memcmp (addr, &gmon_hdr, sizeof (struct gmon_hdr)) != 0
+      if (memcmp_eq (addr, &gmon_hdr, sizeof (struct gmon_hdr)) != 0
 	  || *(uint32_t *) hist != GMON_TAG_TIME_HIST
-	  || memcmp (hist + sizeof (uint32_t), &hist_hdr,
-		     sizeof (struct gmon_hist_hdr)) != 0
+	  || memcmp_eq (hist + sizeof (uint32_t), &hist_hdr,
+			sizeof (struct gmon_hist_hdr)) != 0
 	  || narcsp[-1] != GMON_TAG_CG_ARC)
 	goto wrong_format;
     }
diff --git a/elf/ldconfig.c b/elf/ldconfig.c
index 57bb95ebc3..94334af932 100644
--- a/elf/ldconfig.c
+++ b/elf/ldconfig.c
@@ -861,8 +861,9 @@  search_dir (const struct dir_entry *entry)
 		      ".#prelink#") == 0)
 	    continue;
 	  if (len >= sizeof (".#prelink#.XXXXXX") - 1
-	      && memcmp (direntry->d_name + len - sizeof (".#prelink#.XXXXXX")
-			 + 1, ".#prelink#.", sizeof (".#prelink#.") - 1) == 0)
+	      && memcmp_eq (direntry->d_name
+			    + len - sizeof (".#prelink#.XXXXXX") + 1,
+			    ".#prelink#.", sizeof (".#prelink#.") - 1) == 0)
 	    continue;
 	}
       len += strlen (entry->path) + 2;
diff --git a/elf/pldd.c b/elf/pldd.c
index f720b2c515..9fe80c874f 100644
--- a/elf/pldd.c
+++ b/elf/pldd.c
@@ -297,7 +297,7 @@  get_process_info (const char *exe, int dfd, long int pid)
 
   close (fd);
 
-  if (memcmp (e_ident, ELFMAG, SELFMAG) != 0)
+  if (memcmp_eq (e_ident, ELFMAG, SELFMAG) != 0)
     {
       error (0, 0, gettext ("process %lu is no ELF program"), pid);
       return EXIT_FAILURE;
diff --git a/elf/readelflib.c b/elf/readelflib.c
index e147416363..2e7cad7419 100644
--- a/elf/readelflib.c
+++ b/elf/readelflib.c
@@ -142,7 +142,7 @@  process_elf_file (const char *file_name, const char *lib, int *flag,
 
 	      while (abi_note [0] != 4 || abi_note [1] != 16
 		     || abi_note [2] != 1
-		     || memcmp (abi_note + 3, "GNU", 4) != 0)
+		     || memcmp_eq (abi_note + 3, "GNU", 4) != 0)
 		{
 		  ElfW(Addr) note_size
 		    = ELF_NOTE_NEXT_OFFSET (abi_note[0], abi_note[1],
@@ -186,7 +186,7 @@  process_elf_file (const char *file_name, const char *lib, int *flag,
 		  /* Find the NT_GNU_PROPERTY_TYPE_0 note.  */
 		  if (note->n_namesz == 4
 		      && note->n_type == NT_GNU_PROPERTY_TYPE_0
-		      && memcmp (note + 1, "GNU", 4) == 0)
+		      && memcmp_eq (note + 1, "GNU", 4) == 0)
 		    {
 		      /* Check for invalid property.  */
 		      if (note->n_descsz < 8
diff --git a/elf/readlib.c b/elf/readlib.c
index 3651dcdd8e..4521fbd16d 100644
--- a/elf/readlib.c
+++ b/elf/readlib.c
@@ -115,7 +115,8 @@  process_file (const char *real_file_name, const char *file_name,
 	{
 	  char buf[SELFMAG];
 	  size_t n = MIN (statbuf.st_size, SELFMAG);
-	  if (fread (buf, n, 1, file) == 1 && memcmp (buf, ELFMAG, n) == 0)
+	  if (fread (buf, n, 1, file) == 1
+	      && memcmp_eq (buf, ELFMAG, n) == 0)
 	    error (0, 0, _("File %s is too small, not checked."), file_name);
 	}
       fclose (file);
@@ -156,7 +157,7 @@  process_file (const char *real_file_name, const char *file_name,
     }
 
   elf_header = (ElfW(Ehdr) *) file_contents;
-  if (memcmp (elf_header->e_ident, ELFMAG, SELFMAG) != 0)
+  if (memcmp_eq (elf_header->e_ident, ELFMAG, SELFMAG) != 0)
     {
       /* The file is neither ELF nor aout.  Check if it's a linker
 	 script, like libc.so - otherwise complain.  Only search the
diff --git a/elf/rtld.c b/elf/rtld.c
index 8dafaf61f4..abddf20fc7 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -2619,7 +2619,7 @@  process_dl_debug (struct dl_main_state *state, const char *dl_debug)
 
 	  for (cnt = 0; cnt < ndebopts; ++cnt)
 	    if (debopts[cnt].len == len
-		&& memcmp (dl_debug, debopts[cnt].name, len) == 0)
+		&& memcmp_eq (dl_debug, debopts[cnt].name, len) == 0)
 	      {
 		GLRO(dl_debug_mask) |= debopts[cnt].mask;
 		state->any_debug = true;
@@ -2697,49 +2697,50 @@  process_envvars (struct dl_main_state *state)
 	{
 	case 4:
 	  /* Warning level, verbose or not.  */
-	  if (memcmp (envline, "WARN", 4) == 0)
+	  if (memcmp_eq (envline, "WARN", 4) == 0)
 	    GLRO(dl_verbose) = envline[5] != '\0';
 	  break;
 
 	case 5:
 	  /* Debugging of the dynamic linker?  */
-	  if (memcmp (envline, "DEBUG", 5) == 0)
+	  if (memcmp_eq (envline, "DEBUG", 5) == 0)
 	    {
 	      process_dl_debug (state, &envline[6]);
 	      break;
 	    }
-	  if (memcmp (envline, "AUDIT", 5) == 0)
+	  if (memcmp_eq (envline, "AUDIT", 5) == 0)
 	    audit_list_add_string (&state->audit_list, &envline[6]);
 	  break;
 
 	case 7:
 	  /* Print information about versions.  */
-	  if (memcmp (envline, "VERBOSE", 7) == 0)
+	  if (memcmp_eq (envline, "VERBOSE", 7) == 0)
 	    {
 	      state->version_info = envline[8] != '\0';
 	      break;
 	    }
 
 	  /* List of objects to be preloaded.  */
-	  if (memcmp (envline, "PRELOAD", 7) == 0)
+	  if (memcmp_eq (envline, "PRELOAD", 7) == 0)
 	    {
 	      state->preloadlist = &envline[8];
 	      break;
 	    }
 
 	  /* Which shared object shall be profiled.  */
-	  if (memcmp (envline, "PROFILE", 7) == 0 && envline[8] != '\0')
+	  if (memcmp_eq (envline, "PROFILE", 7) == 0
+	      && envline[8] != '\0')
 	    GLRO(dl_profile) = &envline[8];
 	  break;
 
 	case 8:
 	  /* Do we bind early?  */
-	  if (memcmp (envline, "BIND_NOW", 8) == 0)
+	  if (memcmp_eq (envline, "BIND_NOW", 8) == 0)
 	    {
 	      GLRO(dl_lazy) = envline[9] == '\0';
 	      break;
 	    }
-	  if (memcmp (envline, "BIND_NOT", 8) == 0)
+	  if (memcmp_eq (envline, "BIND_NOT", 8) == 0)
 	    GLRO(dl_bind_not) = envline[9] != '\0';
 	  break;
 
@@ -2747,7 +2748,7 @@  process_envvars (struct dl_main_state *state)
 	  /* Test whether we want to see the content of the auxiliary
 	     array passed up from the kernel.  */
 	  if (!__libc_enable_secure
-	      && memcmp (envline, "SHOW_AUXV", 9) == 0)
+	      && memcmp_eq (envline, "SHOW_AUXV", 9) == 0)
 	    _dl_show_auxv ();
 	  break;
 
@@ -2755,7 +2756,7 @@  process_envvars (struct dl_main_state *state)
 	case 10:
 	  /* Mask for the important hardware capabilities.  */
 	  if (!__libc_enable_secure
-	      && memcmp (envline, "HWCAP_MASK", 10) == 0)
+	      && memcmp_eq (envline, "HWCAP_MASK", 10) == 0)
 	    GLRO(dl_hwcap_mask) = _dl_strtoul (&envline[11], NULL);
 	  break;
 #endif
@@ -2763,14 +2764,14 @@  process_envvars (struct dl_main_state *state)
 	case 11:
 	  /* Path where the binary is found.  */
 	  if (!__libc_enable_secure
-	      && memcmp (envline, "ORIGIN_PATH", 11) == 0)
+	      && memcmp_eq (envline, "ORIGIN_PATH", 11) == 0)
 	    GLRO(dl_origin_path) = &envline[12];
 	  break;
 
 	case 12:
 	  /* The library search path.  */
 	  if (!__libc_enable_secure
-	      && memcmp (envline, "LIBRARY_PATH", 12) == 0)
+	      && memcmp_eq (envline, "LIBRARY_PATH", 12) == 0)
 	    {
 	      state->library_path = &envline[13];
 	      state->library_path_source = "LD_LIBRARY_PATH";
@@ -2778,14 +2779,14 @@  process_envvars (struct dl_main_state *state)
 	    }
 
 	  /* Where to place the profiling data file.  */
-	  if (memcmp (envline, "DEBUG_OUTPUT", 12) == 0)
+	  if (memcmp_eq (envline, "DEBUG_OUTPUT", 12) == 0)
 	    {
 	      debug_output = &envline[13];
 	      break;
 	    }
 
 	  if (!__libc_enable_secure
-	      && memcmp (envline, "DYNAMIC_WEAK", 12) == 0)
+	      && memcmp_eq (envline, "DYNAMIC_WEAK", 12) == 0)
 	    GLRO(dl_dynamic_weak) = 1;
 	  break;
 
@@ -2796,7 +2797,7 @@  process_envvars (struct dl_main_state *state)
 	  EXTRA_LD_ENVVARS_13
 #endif
 	  if (!__libc_enable_secure
-	      && memcmp (envline, "USE_LOAD_BIAS", 13) == 0)
+	      && memcmp_eq (envline, "USE_LOAD_BIAS", 13) == 0)
 	    {
 	      GLRO(dl_use_load_bias) = envline[14] == '1' ? -1 : 0;
 	      break;
@@ -2806,14 +2807,14 @@  process_envvars (struct dl_main_state *state)
 	case 14:
 	  /* Where to place the profiling data file.  */
 	  if (!__libc_enable_secure
-	      && memcmp (envline, "PROFILE_OUTPUT", 14) == 0
+	      && memcmp_eq (envline, "PROFILE_OUTPUT", 14) == 0
 	      && envline[15] != '\0')
 	    GLRO(dl_profile_output) = &envline[15];
 	  break;
 
 	case 16:
 	  /* The mode of the dynamic linker can be set.  */
-	  if (memcmp (envline, "TRACE_PRELINKING", 16) == 0)
+	  if (memcmp_eq (envline, "TRACE_PRELINKING", 16) == 0)
 	    {
 	      state->mode = rtld_mode_trace;
 	      GLRO(dl_verbose) = 1;
@@ -2824,7 +2825,7 @@  process_envvars (struct dl_main_state *state)
 
 	case 20:
 	  /* The mode of the dynamic linker can be set.  */
-	  if (memcmp (envline, "TRACE_LOADED_OBJECTS", 20) == 0)
+	  if (memcmp_eq (envline, "TRACE_LOADED_OBJECTS", 20) == 0)
 	    state->mode = rtld_mode_trace;
 	  break;
 
diff --git a/elf/sprof.c b/elf/sprof.c
index 405fbcbf38..08afe170bd 100644
--- a/elf/sprof.c
+++ b/elf/sprof.c
@@ -889,22 +889,22 @@  load_profdata (const char *name, struct shobj *shobj)
   hist_hdr.dimen_abbrev = 's';
 
   /* Test whether the header of the profiling data is ok.  */
-  if (memcmp (addr, &gmon_hdr, sizeof (struct gmon_hdr)) != 0
+  if (memcmp_eq (addr, &gmon_hdr, sizeof (struct gmon_hdr)) != 0
       || *(uint32_t *) result->hist != GMON_TAG_TIME_HIST
-      || memcmp (result->hist_hdr, &hist_hdr,
-		 sizeof (struct gmon_hist_hdr)) != 0
+      || memcmp_eq (result->hist_hdr, &hist_hdr,
+		    sizeof (struct gmon_hist_hdr)) != 0
       || narcsp[-1] != GMON_TAG_CG_ARC)
     {
       error (0, 0, _("`%s' is no correct profile data file for `%s'"),
 	     name, shobj->name);
       if (do_test)
 	{
-	  if (memcmp (addr, &gmon_hdr, sizeof (struct gmon_hdr)) != 0)
+	  if (memcmp_eq (addr, &gmon_hdr, sizeof (struct gmon_hdr)) != 0)
 	    puts ("gmon_hdr differs");
 	  if (*(uint32_t *) result->hist != GMON_TAG_TIME_HIST)
 	    puts ("result->hist differs");
-	  if (memcmp (result->hist_hdr, &hist_hdr,
-		      sizeof (struct gmon_hist_hdr)) != 0)
+	  if (memcmp_eq (result->hist_hdr, &hist_hdr,
+			 sizeof (struct gmon_hist_hdr)) != 0)
 	    puts ("hist_hdr differs");
 	  if (narcsp[-1] != GMON_TAG_CG_ARC)
 	    puts ("narcsp[-1] differs");
diff --git a/include/string.h b/include/string.h
index 21f641a413..6a39fb1ec6 100644
--- a/include/string.h
+++ b/include/string.h
@@ -160,6 +160,7 @@  extern __typeof (__strsep_g) __strsep_g attribute_hidden;
 
 extern __typeof (memchr) memchr attribute_hidden;
 extern __typeof (memcmp) memcmp attribute_hidden;
+extern __typeof (__memcmpeq) __memcmpeq attribute_hidden;
 extern __typeof (memcpy) memcpy attribute_hidden;
 extern __typeof (memmove) memmove attribute_hidden;
 extern __typeof (memset) memset attribute_hidden;
@@ -172,6 +173,8 @@  extern __typeof (strnlen) strnlen attribute_hidden;
 extern __typeof (strsep) strsep attribute_hidden;
 #endif
 
+#include <memcmp-eq.h>
+
 #if (!IS_IN (libc) || !defined SHARED) \
   && !defined NO_MEMPCPY_STPCPY_REDIRECT
 /* Redirect calls to __builtin_mempcpy and __builtin_stpcpy to call
diff --git a/sysdeps/generic/memcmp-eq.h b/sysdeps/generic/memcmp-eq.h
new file mode 100644
index 0000000000..07e723fef5
--- /dev/null
+++ b/sysdeps/generic/memcmp-eq.h
@@ -0,0 +1,24 @@ 
+/* Function to compare byte sequences for equality.  Generic version.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+
+static __always_inline int
+memcmp_eq (const void *s1, const void *s2, size_t n)
+{
+  return memcmp (s1, s2, n);
+}
diff --git a/sysdeps/x86_64/memcmp-eq.h b/sysdeps/x86_64/memcmp-eq.h
new file mode 100644
index 0000000000..3d907948f0
--- /dev/null
+++ b/sysdeps/x86_64/memcmp-eq.h
@@ -0,0 +1,26 @@ 
+/* Function to compare byte sequences for equality.  x86-64 version.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+static __always_inline int
+memcmp_eq (const void *s1, const void *s2, size_t n)
+{
+  /* Compiler can better optimize the constant size.  */
+  if (__builtin_constant_p (n))
+    return memcmp (s1, s2, n);
+  return __memcmpeq (s1, s2, n);
+}
diff --git a/sysdeps/x86_64/multiarch/memcmpeq-sse2.S b/sysdeps/x86_64/multiarch/memcmpeq-sse2.S
index b80a29d4b0..de7f5a7525 100644
--- a/sysdeps/x86_64/multiarch/memcmpeq-sse2.S
+++ b/sysdeps/x86_64/multiarch/memcmpeq-sse2.S
@@ -16,8 +16,10 @@ 
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#ifndef memcmp
+#if IS_IN (libc)
 # define memcmp	__memcmpeq_sse2
+#else
+# define memcmp	__memcmpeq
 #endif
 #define USE_AS_MEMCMPEQ	1
 #include "memcmp-sse2.S"