diff mbox

[RFC] Avoid cltz in comparison functions.

Message ID 20150601085546.GA17458@domone
State New
Headers show

Commit Message

Ondřej Bílka June 1, 2015, 8:55 a.m. UTC
On Mon, Jun 01, 2015 at 12:01:31AM +0200, Ondřej Bílka wrote:
> And here is updated version of comparing functions.
> 
I recalled another trick that I knew but couldn't use for sse2 code.

A trick for strcmp is that you don't need access individual bytes to get
inequality. Just mask bytes after first difference and compare them.
That could be done with following.

  // find first bit where they differ.
  mask = (x ^ y) ^ ((x ^ y) - 1)

  // set whole bytes of mask
  mask = (mask & ones * 255)

  // also remove bytes after terminating 0
  mask &= zero ^ (zero -1);

  if (x & mask > y & mask)
    return 1;
  if (x & mask < y & mask)
    return -1;
  return 0;

Then you could replace these checks with branchless ones as I did in
following RFC.

A problem is that it would make diff skeleton more messy as sometimes
its used for finding byte location like in strcasecmp.

So how is best way to integrate tricks like that?
diff mbox

Patch

diff --git a/sysdeps/generic/string_vector.h b/sysdeps/generic/string_vector.h
index a3c4741..18ca118 100644
--- a/sysdeps/generic/string_vector.h
+++ b/sysdeps/generic/string_vector.h
@@ -191,6 +191,47 @@  bytes_equal_nocarry (vector_int x, vector_int y)
 }
 #endif
 
+#ifndef CUSTOM_COMPARE_BYTES
+# if __BYTE_ORDER == __LITTLE_ENDIAN
+static __always_inline
+int
+compare_bytes (vector_int x, vector_int y, vector_int zero_mask)
+{
+  /* Removes bytes beyond terminating '\0'. When zero mask its 0 then it 
+     evaluates to ~0 and doesn't mask anything.  */
+  zero_mask = zero_mask ^ (zero_mask - 1);
+
+  /* Find first bit that differs. Make mask with bits upto that bit set 
+     to 1.  */
+  vector_int and_mask = (((x ^ y) ^ ((x ^ y) - 1)) 
+
+  /* With difference in highest bit we can't compare them by subtraction.  */
+
+  if (and_mask == ~((vector_int) 0))
+    {
+      if (x > y)
+        return 1;
+      if (x < y)
+        return -1;
+      return 0;
+    }
+
+  /* Convert mask to bytewise one that has in given byte 255 or 0 if bitwise
+     mask had some bit selected or not.  */
+
+  and_mask = (and_mask & ones) * 255;
+
+  and_mask = and_mask & zero_mask;
+  
+  /* As check above ensures that difference fits signed vector_int we could 
+     subtract them.  */
+
+  vector_int diff = (x & and_mask) - (y_greater & and_mask);
+  return (int)(diff >> (8 * (LSIZE - sizeof (int))));
+}
+# endif
+#endif
+
 /*
   When you have hardware ctz/clz its probably best bet. However
   for softare emulation you could get better than generic one as you