Message ID | 1405488709-12677-2-git-send-email-andi@firstfloor.org |
---|---|
State | New |
Headers | show |
> +++ b/gcc/inchash.h > +class inchash > +{ > + hashval_t val; normal style would be explicit private: at the end. > + public: > + > + /* Start incremential hashing, optionally with SEED. */ > + void begin (hashval_t seed = 0) > + { > + val = seed; why isn't this the ctor? > + /* Add unsigned value V. */ > + void add_int (unsigned v) > + { > + val = iterative_hash_hashval_t (v, val); > + } istm this is a great spot to provide a bunch of overloads of just add() and let the compiler pick the appropriate one for your type. thanks! Trev
On Wed, Jul 16, 2014 at 10:40:53PM -0400, Trevor Saunders wrote: > > +++ b/gcc/inchash.h > > +class inchash > > +{ > > + hashval_t val; > > normal style would be explicit private: at the end. Ok. > > > + public: > > + > > + /* Start incremential hashing, optionally with SEED. */ > > + void begin (hashval_t seed = 0) > > + { > > + val = seed; > > why isn't this the ctor? It's standard for hash classes to have explicit begin()/end(). All the existing ones I've seen work this way. > > > + /* Add unsigned value V. */ > > + void add_int (unsigned v) > > + { > > + val = iterative_hash_hashval_t (v, val); > > + } > > istm this is a great spot to provide a bunch of overloads of just add() > and let the compiler pick the appropriate one for your type. Sorry I'm not into code obfuscation. With hashing it's far better to work with explicit visible types instead of invisible magic. -Andi
On Thu, Jul 17, 2014 at 06:36:31AM +0200, Andi Kleen wrote: > On Wed, Jul 16, 2014 at 10:40:53PM -0400, Trevor Saunders wrote: > > > > > + public: > > > + > > > + /* Start incremential hashing, optionally with SEED. */ > > > + void begin (hashval_t seed = 0) > > > + { > > > + val = seed; > > > > why isn't this the ctor? > > It's standard for hash classes to have explicit begin()/end(). > All the existing ones I've seen work this way. I only know of one vaguelly similar thing http://mxr.mozilla.org/mozilla-central/source/mfbt/SHA1.h#37 which doesn't do that, and a bunch of people doing something doesn't necessarily mean it makes sense. Now there may be a good reason it does make sense, but unless these other people need begin() to be fallible I don't see it. > > > + /* Add unsigned value V. */ > > > + void add_int (unsigned v) > > > + { > > > + val = iterative_hash_hashval_t (v, val); > > > + } > > > > istm this is a great spot to provide a bunch of overloads of just add() > > and let the compiler pick the appropriate one for your type. > > Sorry I'm not into code obfuscation. With hashing it's far better > to work with explicit visible types instead of invisible magic. if that were true I'd expect you'd see lots of cases of people using a different hash function than the type of the expression being passed, but a quick look at the later patches didn't show me any of those. Not repeating the type of something is hardly obfiscation, and in most cases there's only one sane function to call for a given expression. but I guess its easy enough to change later if somebody gets really annoyed by it so whatever. Trev > > -Andi
On 07/15/14 23:31, Andi Kleen wrote: > From: Andi Kleen <ak@linux.intel.com> > > Some files in gcc, like lto or tree, do large scale incremential hashing. > The current jhash implementation of this could be likely improved > by using an incremential hash that does not do a full rehashing > for every new value added. > > This patch adds a new "inchash" class that abstracts the internal > state of the hash. This makes it easier to plug in new hashes > and also cleans up the code a bit. > > Right now it is just implemented in the same way as the old > iterative hash in tree.c. The previous iterative hash code > from tree.c moved into a new separate file. Also I fixed up all > users to include the new header. > > It should not really change any hashing by itself, it's just > a cleanup at this point. > > gcc/: > > 2014-07-10 Andi Kleen <ak@linux.intel.com> > > * Makefile.in (OBJS): Add inchash.o. > (PLUGIN_HEADERS): Add inchash.h. > * ipa-devirt.c: Include inchash.h. > * lto-streamer-out.c: Dito. > * tree-ssa-dom.c: Dito. > * tree-ssa-pre.c: Dito. > * tree-ssa-sccvn.c: Dito. > * tree-ssa-tail-merge.c: Dito. > * asan.c: Dito. > * tree.c (iterative_hash_hashval_t): Move to ... > (iterative_hash_host_wide_int): Move to ... > * inchash.c: Here. New file. > * tree.h (iterative_hash_hashval_t): Move to ... > (iterative_hash_host_wide_int): Move to ... > * inchash.h: Here. New file. If there isn't any concrete benefit from using explicit begin/end, I'd just initialize the data in the ctor. Otherwise, this is a fine cleanup. Presumably there's going to be some better hashing functions coming down the pipe? Jeff
On Fri, Jul 18, 2014 at 3:08 AM, Trevor Saunders <tsaunders@mozilla.com> wrote: > On Thu, Jul 17, 2014 at 06:36:31AM +0200, Andi Kleen wrote: >> On Wed, Jul 16, 2014 at 10:40:53PM -0400, Trevor Saunders wrote: >> > >> > > + public: >> > > + >> > > + /* Start incremential hashing, optionally with SEED. */ >> > > + void begin (hashval_t seed = 0) >> > > + { >> > > + val = seed; >> > >> > why isn't this the ctor? >> >> It's standard for hash classes to have explicit begin()/end(). >> All the existing ones I've seen work this way. > > I only know of one vaguelly similar thing > http://mxr.mozilla.org/mozilla-central/source/mfbt/SHA1.h#37 which > doesn't do that, and a bunch of people doing something doesn't > necessarily mean it makes sense. Now there may be a good reason it > does make sense, but unless these other people need begin() to be > fallible I don't see it. I agree with Trevor here. Please make begin() the constructor. Btw, what will be the way to plug in an alternative hash function? That is, there doesn't seem to be a separation of interface and implementation in your patch (like with a template or a base-class you inherit from). Richard. >> > > + /* Add unsigned value V. */ >> > > + void add_int (unsigned v) >> > > + { >> > > + val = iterative_hash_hashval_t (v, val); >> > > + } >> > >> > istm this is a great spot to provide a bunch of overloads of just add() >> > and let the compiler pick the appropriate one for your type. >> >> Sorry I'm not into code obfuscation. With hashing it's far better >> to work with explicit visible types instead of invisible magic. > > if that were true I'd expect you'd see lots of cases of people using a > different hash function than the type of the expression being passed, > but a quick look at the later patches didn't show me any of those. > Not repeating the type of something is hardly obfiscation, and in most > cases there's only one sane function to call for a given expression. > > but I guess its easy enough to change later if somebody gets really > annoyed by it so whatever. > > Trev > >> >> -Andi
> Btw, what will be the way to plug in an alternative hash function? > That is, there doesn't seem to be a separation of interface > and implementation in your patch (like with a template or a base-class > you inherit from). Just change the inchash.h include file. The point was to only change a single place. Inheritance would need changing everything again for the new type (unless we came up with hash factories that would likely defeat virtualization and would be over engineering) -Andi
On July 23, 2014 4:27:43 PM CEST, Andi Kleen <ak@linux.intel.com> wrote: >> Btw, what will be the way to plug in an alternative hash function? >> That is, there doesn't seem to be a separation of interface >> and implementation in your patch (like with a template or a >base-class >> you inherit from). > >Just change the inchash.h include file. The point was to only >change a single place. So there will be at most one hash implementation? Maybe use a namespace instead of a hash then? So other places can extend it? Why didn't you replace the tree.c uses BTW? Thanks, Richard. >Inheritance would need changing everything again for the new type >(unless we came up with hash factories that would likely defeat >virtualization and would be over engineering) > >-Andi
> So there will be at most one hash implementation? One per binary I expect. Modern hash functions are pretty good, so it's unlikely that someone needs to come up with special purpose hashes. I found Bob Jenkins' spooky is rather good for this case (very large incremential keys), but it is only efficient on 64bit hosts. So it would need a fallback (like mumurhash2a or the existing one) on 32bit. Another alternative would be to use CityHash, but that also has multiple variants, including some machine specific ones (e.g. use the CRC instructions on SSE2 hosts) > Maybe use a namespace instead of a hash then? I don't understand the suggestion. > So other places can extend it? Not sure that is needed. > Why didn't you replace the tree.c uses BTW? Patches were already quite big, but I'll add it. -Andi
> > Why didn't you replace the tree.c uses BTW? > > Patches were already quite big, but I'll add it. Actually I handled them all in tree.c. Did you mean something else? I didn't convert all of tree-ssa-* and dwarf* so far, and a few other places. This can be done step by step. -Andi
On Wed, 2014-07-23 at 11:37 +0200, Richard Biener wrote: > On Fri, Jul 18, 2014 at 3:08 AM, Trevor Saunders <tsaunders@mozilla.com> wrote: > > On Thu, Jul 17, 2014 at 06:36:31AM +0200, Andi Kleen wrote: > >> On Wed, Jul 16, 2014 at 10:40:53PM -0400, Trevor Saunders wrote: > >> > > >> > > + public: > >> > > + > >> > > + /* Start incremential hashing, optionally with SEED. */ > >> > > + void begin (hashval_t seed = 0) > >> > > + { > >> > > + val = seed; > >> > > >> > why isn't this the ctor? > >> > >> It's standard for hash classes to have explicit begin()/end(). > >> All the existing ones I've seen work this way. > > > > I only know of one vaguelly similar thing > > http://mxr.mozilla.org/mozilla-central/source/mfbt/SHA1.h#37 which > > doesn't do that, and a bunch of people doing something doesn't > > necessarily mean it makes sense. Now there may be a good reason it > > does make sense, but unless these other people need begin() to be > > fallible I don't see it. > > I agree with Trevor here. Please make begin() the constructor. > Btw, what will be the way to plug in an alternative hash function? > That is, there doesn't seem to be a separation of interface > and implementation in your patch (like with a template or a base-class > you inherit from). Isn't that what boost / std hash is actually doing? Maybe something like the attached example could be an improvement? As with boost / std hash, the hash function is a template functor that can be specialized on various types. The 'incremental_hash' simply plugs together a hash value combiner and a hash function. Types that are not implemented by the hash function (in this example) are caught at compile time, since implicit conversions do not take place. However, it should also possible to write hash functions with a bunch of operator () overloads to utilize implicit conversions. One advantage of this construct is that new types can be added along with hash function specializations, without the need to touch any of the existing hashing facilities. I don't know how/whether this would fit with the already existing hash map stuff (hash-table.h, hash-map.h). It seems those don't support user specified hash functions. Cheers, Oleg
On Thu, Jul 24, 2014 at 02:00:24AM +0200, Oleg Endo wrote: > On Wed, 2014-07-23 at 11:37 +0200, Richard Biener wrote: > > On Fri, Jul 18, 2014 at 3:08 AM, Trevor Saunders <tsaunders@mozilla.com> wrote: > > > On Thu, Jul 17, 2014 at 06:36:31AM +0200, Andi Kleen wrote: > > >> On Wed, Jul 16, 2014 at 10:40:53PM -0400, Trevor Saunders wrote: > > >> > > > >> > > + public: > > >> > > + > > >> > > + /* Start incremential hashing, optionally with SEED. */ > > >> > > + void begin (hashval_t seed = 0) > > >> > > + { > > >> > > + val = seed; > > >> > > > >> > why isn't this the ctor? > > >> > > >> It's standard for hash classes to have explicit begin()/end(). > > >> All the existing ones I've seen work this way. > > > > > > I only know of one vaguelly similar thing > > > http://mxr.mozilla.org/mozilla-central/source/mfbt/SHA1.h#37 which > > > doesn't do that, and a bunch of people doing something doesn't > > > necessarily mean it makes sense. Now there may be a good reason it > > > does make sense, but unless these other people need begin() to be > > > fallible I don't see it. > > > > I agree with Trevor here. Please make begin() the constructor. > > Btw, what will be the way to plug in an alternative hash function? > > That is, there doesn't seem to be a separation of interface > > and implementation in your patch (like with a template or a base-class > > you inherit from). > > Isn't that what boost / std hash is actually doing? Maybe something > like the attached example could be an improvement? I don't really see why that makes it any easier to plug in a different hash algorithm > As with boost / std hash, the hash function is a template functor that > can be specialized on various types. The 'incremental_hash' simply > plugs together a hash value combiner and a hash function. Types that > are not implemented by the hash function (in this example) are caught at > compile time, since implicit conversions do not take place. However, it > should also possible to write hash functions with a bunch of operator () > overloads to utilize implicit conversions. at that point it seems like its simpler and equivelent to just overload a global function. > One advantage of this construct is that new types can be added along > with hash function specializations, without the need to touch any of the > existing hashing facilities. > > I don't know how/whether this would fit with the already existing hash > map stuff (hash-table.h, hash-map.h). It seems those don't support user > specified hash functions. they do, see e.g. graphite-htab.h Trev > > Cheers, > Oleg > > typedef unsigned int hashval_t; > typedef long long HOST_WIDE_INT; > typedef unsigned int size_t; > > // ----------------------------- > // hash function and some specializations > > struct raw_data > { > const void* data; > size_t len; > > raw_data (const void* d, size_t l) : data (d), len (l) { } > > template <typename T> > explicit raw_data (const T& val) : data (&val), len (sizeof (T)) { } > }; > > template <typename T> struct my_hash; > > template <> struct my_hash<unsigned int> > { > hashval_t operator () (unsigned int val); > }; > > template <> struct my_hash<HOST_WIDE_INT> > { > hashval_t operator () (HOST_WIDE_INT val); > }; > > template <> struct my_hash<raw_data> > { > hashval_t operator () (raw_data val); > }; > > template <> struct my_hash<const void*> > { > hashval_t operator () (const void* val); > }; > > template <typename T> struct my_hash<T*> > { > hashval_t operator () (T* val) > { > return my_hash<const void*> () (val); > } > }; > > template <typename T> struct my_hash<const T*> > { > hashval_t operator () (const T* val) > { > return my_hash<const void*> () (val); > } > }; > > // ----------------------------- > // a possible hash combiner > struct my_hash_combiner > { > hashval_t operator () (hashval_t a, hashval_t b); > }; > > > // ----------------------------- > // generic incremental hash > > template <template<typename> class HashFunction, class HashValueCombiner> > class incremental_hash > { > public: > incremental_hash (void) : m_value (0) { } > incremental_hash (hashval_t seed) : m_value (seed) { } > > hashval_t value (void) const { return m_value; } > > template <typename T> incremental_hash& add (const T& val) > { > m_value = HashValueCombiner () (m_value, HashFunction<T> () (val)); > return *this; > } > > private: > hashval_t m_value; > }; > > // hash function of an incremental_hash is its current value, regardless > // of the actual hash function used. > template <template<typename> class Func, class Combiner> > struct my_hash< incremental_hash<Func, Combiner> > > { > hashval_t operator () (const incremental_hash<Func, Combiner>& val) > { > return val.value (); > } > }; > > > // ----------------------------- > // use case > > struct my_object > { > int a, b, c, d; > }; > > typedef incremental_hash<my_hash, my_hash_combiner> my_inc_hash; > > int main (int argc, const char* argv[]) > { > my_object some_obj; > > my_inc_hash hash; > > hash.add (5U) > .add<unsigned int> (234) // explicit hash func, implicit conversion. > .add (HOST_WIDE_INT (2135)) > .add (raw_data (argv, argc)) // raw data hash > .add (raw_data (some_obj)) // iterative_hash_object macro > .add (argv) // pointer hash > .add (argv[0]); // pointer hash > > my_inc_hash hash2 (0x5533); > hash2.add (123U) > .add (hash); // 'merge' two incremental hashes. > > return hash2.value (); > }
On 24 Jul 2014, at 03:48, Trevor Saunders <tsaunders@mozilla.com> wrote: > On Thu, Jul 24, 2014 at 02:00:24AM +0200, Oleg Endo wrote: >> On Wed, 2014-07-23 at 11:37 +0200, Richard Biener wrote: >>> On Fri, Jul 18, 2014 at 3:08 AM, Trevor Saunders <tsaunders@mozilla.com> wrote: >>>> On Thu, Jul 17, 2014 at 06:36:31AM +0200, Andi Kleen wrote: >>>>> On Wed, Jul 16, 2014 at 10:40:53PM -0400, Trevor Saunders wrote: >>>>>> >>>>>>> + public: >>>>>>> + >>>>>>> + /* Start incremential hashing, optionally with SEED. */ >>>>>>> + void begin (hashval_t seed = 0) >>>>>>> + { >>>>>>> + val = seed; >>>>>> >>>>>> why isn't this the ctor? >>>>> >>>>> It's standard for hash classes to have explicit begin()/end(). >>>>> All the existing ones I've seen work this way. >>>> >>>> I only know of one vaguelly similar thing >>>> http://mxr.mozilla.org/mozilla-central/source/mfbt/SHA1.h#37 which >>>> doesn't do that, and a bunch of people doing something doesn't >>>> necessarily mean it makes sense. Now there may be a good reason it >>>> does make sense, but unless these other people need begin() to be >>>> fallible I don't see it. >>> >>> I agree with Trevor here. Please make begin() the constructor. >>> Btw, what will be the way to plug in an alternative hash function? >>> That is, there doesn't seem to be a separation of interface >>> and implementation in your patch (like with a template or a base-class >>> you inherit from). >> >> Isn't that what boost / std hash is actually doing? Maybe something >> like the attached example could be an improvement? > > I don't really see why that makes it any easier to plug in a different > hash algorithm typedef incremental_hash<my_hash, my_hash_combiner> my_inc_hash; ^^^^^^^ replace with a different class True, it's not any 'easier' than replacing any other name in the code. It makes it easier to mix different type dependent hash functions/ algorithms, though. The way I understood it, Richard asked how to replace the hash function for the incremental hash class. > >> As with boost / std hash, the hash function is a template functor that >> can be specialized on various types. The 'incremental_hash' simply >> plugs together a hash value combiner and a hash function. Types that >> are not implemented by the hash function (in this example) are caught at >> compile time, since implicit conversions do not take place. However, it >> should also possible to write hash functions with a bunch of operator () >> overloads to utilize implicit conversions. > > at that point it seems like its simpler and equivelent to just overload > a global function. AFAIR, there are some cases which are difficult to handle with overloads (implicit conversions etc). It could be a freestanding template function that is specialized for the types in question. > >> One advantage of this construct is that new types can be added along >> with hash function specializations, without the need to touch any of the >> existing hashing facilities. >> >> I don't know how/whether this would fit with the already existing hash >> map stuff (hash-table.h, hash-map.h). It seems those don't support user >> specified hash functions. > > they do, see e.g. graphite-htab.h Aha. I see. Cheers, Oleg
On Wed, Jul 23, 2014 at 9:56 PM, Andi Kleen <andi@firstfloor.org> wrote: >> > Why didn't you replace the tree.c uses BTW? >> >> Patches were already quite big, but I'll add it. > > Actually I handled them all in tree.c. Did you > mean something else? I meant transform the iterative_hash_* _implementations_ to use the interface. But maybe I missed sth in the patch(es). > I didn't convert all of tree-ssa-* and dwarf* so far, > and a few other places. This can be done step by step. Sure. Can you re-post the adjusted patch introducing inchash.[ch]? Thanks, Richard. > -Andi
diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 187e6b6..4c578b3 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1268,6 +1268,7 @@ OBJS = \ hwint.o \ ifcvt.o \ ree.o \ + inchash.o \ incpath.o \ init-regs.o \ internal-fn.o \ @@ -3162,7 +3163,7 @@ PLUGIN_HEADERS = $(TREE_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ tree-parloops.h tree-ssa-address.h tree-ssa-coalesce.h tree-ssa-dom.h \ tree-ssa-loop.h tree-ssa-loop-ivopts.h tree-ssa-loop-manip.h \ tree-ssa-loop-niter.h tree-ssa-ter.h tree-ssa-threadedge.h \ - tree-ssa-threadupdate.h + tree-ssa-threadupdate.h inchash.h # generate the 'build fragment' b-header-vars s-header-vars: Makefile diff --git a/gcc/asan.c b/gcc/asan.c index b9a4a91..b5c8f91 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -29,6 +29,7 @@ along with GCC; see the file COPYING3. If not see #include "internal-fn.h" #include "gimple-expr.h" #include "is-a.h" +#include "inchash.h" #include "gimple.h" #include "gimplify.h" #include "gimple-iterator.h" diff --git a/gcc/inchash.c b/gcc/inchash.c new file mode 100644 index 0000000..0f8583e --- /dev/null +++ b/gcc/inchash.c @@ -0,0 +1,75 @@ +/* Incremential hashing for jhash. + Copyright (C) 2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "hashtab.h" +#include "inchash.h" + +/* Borrowed from hashtab.c iterative_hash implementation. */ +#define mix(a,b,c) \ +{ \ + a -= b; a -= c; a ^= (c>>13); \ + b -= c; b -= a; b ^= (a<< 8); \ + c -= a; c -= b; c ^= ((b&0xffffffff)>>13); \ + a -= b; a -= c; a ^= ((c&0xffffffff)>>12); \ + b -= c; b -= a; b = (b ^ (a<<16)) & 0xffffffff; \ + c -= a; c -= b; c = (c ^ (b>> 5)) & 0xffffffff; \ + a -= b; a -= c; a = (a ^ (c>> 3)) & 0xffffffff; \ + b -= c; b -= a; b = (b ^ (a<<10)) & 0xffffffff; \ + c -= a; c -= b; c = (c ^ (b>>15)) & 0xffffffff; \ +} + + +/* Produce good hash value combining VAL and VAL2. */ +hashval_t +iterative_hash_hashval_t (hashval_t val, hashval_t val2) +{ + /* the golden ratio; an arbitrary value. */ + hashval_t a = 0x9e3779b9; + + mix (a, val, val2); + return val2; +} + +/* Produce good hash value combining VAL and VAL2. */ + +hashval_t +iterative_hash_host_wide_int (HOST_WIDE_INT val, hashval_t val2) +{ + if (sizeof (HOST_WIDE_INT) == sizeof (hashval_t)) + return iterative_hash_hashval_t (val, val2); + else + { + hashval_t a = (hashval_t) val; + /* Avoid warnings about shifting of more than the width of the type on + hosts that won't execute this path. */ + int zero = 0; + hashval_t b = (hashval_t) (val >> (sizeof (hashval_t) * 8 + zero)); + mix (a, b, val2); + if (sizeof (HOST_WIDE_INT) > 2 * sizeof (hashval_t)) + { + hashval_t a = (hashval_t) (val >> (sizeof (hashval_t) * 16 + zero)); + hashval_t b = (hashval_t) (val >> (sizeof (hashval_t) * 24 + zero)); + mix (a, b, val2); + } + return val2; + } +} diff --git a/gcc/inchash.h b/gcc/inchash.h new file mode 100644 index 0000000..071cf92 --- /dev/null +++ b/gcc/inchash.h @@ -0,0 +1,86 @@ +/* An incremental hash abstract data type. + Copyright (C) 2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#ifndef INCHASH_H +#define INCHASH_H 1 + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "hashtab.h" + +/* This file implements an incremential hash function ADT, to be used + by code that incrementially hashes a lot of unrelated data + (not in a single memory block) into a single value. The goal + is to make it easy to plug in efficient hash algorithms. + Currently it just implements the plain old jhash based + incremental hash from gcc's tree.c. */ + +extern hashval_t iterative_hash_host_wide_int (HOST_WIDE_INT, hashval_t); +extern hashval_t iterative_hash_hashval_t (hashval_t, hashval_t); + +class inchash +{ + hashval_t val; + public: + + /* Start incremential hashing, optionally with SEED. */ + void begin (hashval_t seed = 0) + { + val = seed; + } + + /* End incremential hashing and provide the final value. */ + hashval_t end () + { + return val; + } + + /* Add unsigned value V. */ + void add_int (unsigned v) + { + val = iterative_hash_hashval_t (v, val); + } + + /* Add HOST_WIDE_INT value V. */ + void add_wide_int (HOST_WIDE_INT v) + { + val = iterative_hash_host_wide_int (v, val); + } + + /* Hash in pointer PTR. */ + void add_ptr (void *ptr) + { + add (&ptr, sizeof (ptr)); + } + + /* Add a memory block DATA with size LEN. */ + void add (const void *data, size_t len) + { + val = iterative_hash (data, len, val); + } + + /* Hash in state from other inchash OTHER. */ + void merge (inchash &other) + { + val = iterative_hash_hashval_t (other.val, val); + } +}; + +#endif diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c index 59781a1..f3d35d6 100644 --- a/gcc/ipa-devirt.c +++ b/gcc/ipa-devirt.c @@ -118,6 +118,7 @@ along with GCC; see the file COPYING3. If not see #include "pointer-set.h" #include "target.h" #include "hash-table.h" +#include "inchash.h" #include "tree-pretty-print.h" #include "ipa-utils.h" #include "tree-ssa-alias.h" diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c index ccb85b6..cf2e9a8 100644 --- a/gcc/lto-streamer-out.c +++ b/gcc/lto-streamer-out.c @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-pass.h" #include "function.h" #include "diagnostic-core.h" +#include "inchash.h" #include "except.h" #include "lto-symtab.h" #include "lto-streamer.h" diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c index dc30884..300f569 100644 --- a/gcc/lto/lto.c +++ b/gcc/lto/lto.c @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see #include "langhooks.h" #include "bitmap.h" #include "hash-map.h" +#include "inchash.h" #include "ipa-prop.h" #include "common.h" #include "debug.h" diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index c4ec4e5..08fd2fa 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -29,6 +29,7 @@ along with GCC; see the file COPYING3. If not see #include "tm_p.h" #include "basic-block.h" #include "cfgloop.h" +#include "inchash.h" #include "function.h" #include "gimple-pretty-print.h" #include "tree-ssa-alias.h" diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index b01ad28..8abf68a 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -27,6 +27,7 @@ along with GCC; see the file COPYING3. If not see #include "basic-block.h" #include "gimple-pretty-print.h" #include "tree-inline.h" +#include "inchash.h" #include "hash-table.h" #include "tree-ssa-alias.h" #include "internal-fn.h" diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c index 139ac3b..93314fc 100644 --- a/gcc/tree-ssa-sccvn.c +++ b/gcc/tree-ssa-sccvn.c @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3. If not see #include "hash-table.h" #include "tree-ssa-alias.h" #include "internal-fn.h" +#include "inchash.h" #include "gimple-fold.h" #include "tree-eh.h" #include "gimple-expr.h" diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c index 7245223..9600e28 100644 --- a/gcc/tree-ssa-tail-merge.c +++ b/gcc/tree-ssa-tail-merge.c @@ -192,6 +192,7 @@ along with GCC; see the file COPYING3. If not see #include "tree.h" #include "stor-layout.h" #include "trans-mem.h" +#include "inchash.h" #include "tm_p.h" #include "basic-block.h" #include "flags.h" diff --git a/gcc/tree.c b/gcc/tree.c index 46f8963..6de3835 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -42,6 +42,7 @@ along with GCC; see the file COPYING3. If not see #include "obstack.h" #include "toplev.h" /* get_random_seed */ #include "hashtab.h" +#include "inchash.h" #include "filenames.h" #include "output.h" #include "target.h" @@ -4576,56 +4577,6 @@ build_decl_attribute_variant (tree ddecl, tree attribute) return ddecl; } -/* Borrowed from hashtab.c iterative_hash implementation. */ -#define mix(a,b,c) \ -{ \ - a -= b; a -= c; a ^= (c>>13); \ - b -= c; b -= a; b ^= (a<< 8); \ - c -= a; c -= b; c ^= ((b&0xffffffff)>>13); \ - a -= b; a -= c; a ^= ((c&0xffffffff)>>12); \ - b -= c; b -= a; b = (b ^ (a<<16)) & 0xffffffff; \ - c -= a; c -= b; c = (c ^ (b>> 5)) & 0xffffffff; \ - a -= b; a -= c; a = (a ^ (c>> 3)) & 0xffffffff; \ - b -= c; b -= a; b = (b ^ (a<<10)) & 0xffffffff; \ - c -= a; c -= b; c = (c ^ (b>>15)) & 0xffffffff; \ -} - - -/* Produce good hash value combining VAL and VAL2. */ -hashval_t -iterative_hash_hashval_t (hashval_t val, hashval_t val2) -{ - /* the golden ratio; an arbitrary value. */ - hashval_t a = 0x9e3779b9; - - mix (a, val, val2); - return val2; -} - -/* Produce good hash value combining VAL and VAL2. */ -hashval_t -iterative_hash_host_wide_int (HOST_WIDE_INT val, hashval_t val2) -{ - if (sizeof (HOST_WIDE_INT) == sizeof (hashval_t)) - return iterative_hash_hashval_t (val, val2); - else - { - hashval_t a = (hashval_t) val; - /* Avoid warnings about shifting of more than the width of the type on - hosts that won't execute this path. */ - int zero = 0; - hashval_t b = (hashval_t) (val >> (sizeof (hashval_t) * 8 + zero)); - mix (a, b, val2); - if (sizeof (HOST_WIDE_INT) > 2 * sizeof (hashval_t)) - { - hashval_t a = (hashval_t) (val >> (sizeof (hashval_t) * 16 + zero)); - hashval_t b = (hashval_t) (val >> (sizeof (hashval_t) * 24 + zero)); - mix (a, b, val2); - } - return val2; - } -} - /* Return a type like TTYPE except that its TYPE_ATTRIBUTE is ATTRIBUTE and its qualifiers are QUALS. diff --git a/gcc/tree.h b/gcc/tree.h index 8fa928b..e454e2d 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -4288,9 +4288,6 @@ extern int tree_floor_log2 (const_tree); extern unsigned int tree_ctz (const_tree); extern int simple_cst_equal (const_tree, const_tree); extern hashval_t iterative_hash_expr (const_tree, hashval_t); -extern hashval_t iterative_hash_host_wide_int (HOST_WIDE_INT, hashval_t); -extern hashval_t iterative_hash_hashval_t (hashval_t, hashval_t); -extern hashval_t iterative_hash_host_wide_int (HOST_WIDE_INT, hashval_t); extern int compare_tree_int (const_tree, unsigned HOST_WIDE_INT); extern int type_list_equal (const_tree, const_tree); extern int chain_member (const_tree, const_tree);
From: Andi Kleen <ak@linux.intel.com> Some files in gcc, like lto or tree, do large scale incremential hashing. The current jhash implementation of this could be likely improved by using an incremential hash that does not do a full rehashing for every new value added. This patch adds a new "inchash" class that abstracts the internal state of the hash. This makes it easier to plug in new hashes and also cleans up the code a bit. Right now it is just implemented in the same way as the old iterative hash in tree.c. The previous iterative hash code from tree.c moved into a new separate file. Also I fixed up all users to include the new header. It should not really change any hashing by itself, it's just a cleanup at this point. gcc/: 2014-07-10 Andi Kleen <ak@linux.intel.com> * Makefile.in (OBJS): Add inchash.o. (PLUGIN_HEADERS): Add inchash.h. * ipa-devirt.c: Include inchash.h. * lto-streamer-out.c: Dito. * tree-ssa-dom.c: Dito. * tree-ssa-pre.c: Dito. * tree-ssa-sccvn.c: Dito. * tree-ssa-tail-merge.c: Dito. * asan.c: Dito. * tree.c (iterative_hash_hashval_t): Move to ... (iterative_hash_host_wide_int): Move to ... * inchash.c: Here. New file. * tree.h (iterative_hash_hashval_t): Move to ... (iterative_hash_host_wide_int): Move to ... * inchash.h: Here. New file. gcc/lto/: 2014-07-10 Andi Kleen <ak@linux.intel.com> * lto.c: Include inchash.h --- gcc/Makefile.in | 3 +- gcc/asan.c | 1 + gcc/inchash.c | 75 +++++++++++++++++++++++++++++++++++++++++ gcc/inchash.h | 86 +++++++++++++++++++++++++++++++++++++++++++++++ gcc/ipa-devirt.c | 1 + gcc/lto-streamer-out.c | 1 + gcc/lto/lto.c | 1 + gcc/tree-ssa-dom.c | 1 + gcc/tree-ssa-pre.c | 1 + gcc/tree-ssa-sccvn.c | 1 + gcc/tree-ssa-tail-merge.c | 1 + gcc/tree.c | 51 +--------------------------- gcc/tree.h | 3 -- 13 files changed, 172 insertions(+), 54 deletions(-) create mode 100644 gcc/inchash.c create mode 100644 gcc/inchash.h