diff mbox

[1/4] Add an abstract incremental hash data type

Message ID 1405488709-12677-2-git-send-email-andi@firstfloor.org
State New
Headers show

Commit Message

Andi Kleen July 16, 2014, 5:31 a.m. UTC
From: Andi Kleen <ak@linux.intel.com>

Some files in gcc, like lto or tree, do large scale incremential hashing.
The current jhash implementation of this could be likely improved
by using an incremential hash that does not do a full rehashing
for every new value added.

This patch adds a new "inchash" class that abstracts the internal
state of the hash. This makes it easier to plug in new hashes
and also cleans up the code a bit.

Right now it is just implemented in the same way as the old
iterative hash in tree.c. The previous iterative hash code
from tree.c moved into a new separate file. Also I fixed up all
users to include the new header.

It should not really change any hashing by itself, it's just
a cleanup at this point.

gcc/:

2014-07-10  Andi Kleen  <ak@linux.intel.com>

	* Makefile.in (OBJS): Add inchash.o.
	(PLUGIN_HEADERS): Add inchash.h.
	* ipa-devirt.c: Include inchash.h.
	* lto-streamer-out.c: Dito.
	* tree-ssa-dom.c: Dito.
	* tree-ssa-pre.c: Dito.
	* tree-ssa-sccvn.c: Dito.
	* tree-ssa-tail-merge.c: Dito.
	* asan.c: Dito.
	* tree.c (iterative_hash_hashval_t): Move to ...
	(iterative_hash_host_wide_int): Move to ...
	* inchash.c: Here. New file.
	* tree.h (iterative_hash_hashval_t): Move to ...
	(iterative_hash_host_wide_int): Move to ...
	* inchash.h: Here. New file.

gcc/lto/:

2014-07-10  Andi Kleen  <ak@linux.intel.com>

	* lto.c: Include inchash.h
---
 gcc/Makefile.in           |  3 +-
 gcc/asan.c                |  1 +
 gcc/inchash.c             | 75 +++++++++++++++++++++++++++++++++++++++++
 gcc/inchash.h             | 86 +++++++++++++++++++++++++++++++++++++++++++++++
 gcc/ipa-devirt.c          |  1 +
 gcc/lto-streamer-out.c    |  1 +
 gcc/lto/lto.c             |  1 +
 gcc/tree-ssa-dom.c        |  1 +
 gcc/tree-ssa-pre.c        |  1 +
 gcc/tree-ssa-sccvn.c      |  1 +
 gcc/tree-ssa-tail-merge.c |  1 +
 gcc/tree.c                | 51 +---------------------------
 gcc/tree.h                |  3 --
 13 files changed, 172 insertions(+), 54 deletions(-)
 create mode 100644 gcc/inchash.c
 create mode 100644 gcc/inchash.h

Comments

Trevor Saunders July 17, 2014, 2:40 a.m. UTC | #1
> +++ b/gcc/inchash.h
> +class inchash
> +{
> +  hashval_t val;

normal style would be explicit private: at the end.

> + public:
> +
> +  /* Start incremential hashing, optionally with SEED.  */
> +  void begin (hashval_t seed = 0)
> +  {
> +    val = seed;

why isn't this the ctor?

> +  /* Add unsigned value V.  */
> +  void add_int (unsigned v)
> +  {
> +    val = iterative_hash_hashval_t (v, val);
> +  }

istm this is a great spot to provide a bunch of overloads of just add()
and let the compiler pick the appropriate one for your type.

thanks!

Trev
Andi Kleen July 17, 2014, 4:36 a.m. UTC | #2
On Wed, Jul 16, 2014 at 10:40:53PM -0400, Trevor Saunders wrote:
> > +++ b/gcc/inchash.h
> > +class inchash
> > +{
> > +  hashval_t val;
> 
> normal style would be explicit private: at the end.

Ok.

> 
> > + public:
> > +
> > +  /* Start incremential hashing, optionally with SEED.  */
> > +  void begin (hashval_t seed = 0)
> > +  {
> > +    val = seed;
> 
> why isn't this the ctor?

It's standard for hash classes to have explicit begin()/end().
All the existing ones I've seen work this way.

> 
> > +  /* Add unsigned value V.  */
> > +  void add_int (unsigned v)
> > +  {
> > +    val = iterative_hash_hashval_t (v, val);
> > +  }
> 
> istm this is a great spot to provide a bunch of overloads of just add()
> and let the compiler pick the appropriate one for your type.

Sorry I'm not into code obfuscation. With hashing it's far better
to work with explicit visible types instead of invisible magic.

-Andi
Trevor Saunders July 18, 2014, 1:08 a.m. UTC | #3
On Thu, Jul 17, 2014 at 06:36:31AM +0200, Andi Kleen wrote:
> On Wed, Jul 16, 2014 at 10:40:53PM -0400, Trevor Saunders wrote:
> > 
> > > + public:
> > > +
> > > +  /* Start incremential hashing, optionally with SEED.  */
> > > +  void begin (hashval_t seed = 0)
> > > +  {
> > > +    val = seed;
> > 
> > why isn't this the ctor?
> 
> It's standard for hash classes to have explicit begin()/end().
> All the existing ones I've seen work this way.

 I only know of one vaguelly similar thing
 http://mxr.mozilla.org/mozilla-central/source/mfbt/SHA1.h#37  which
 doesn't do that, and a bunch of people doing something doesn't
 necessarily mean it makes sense.  Now there may be a good reason it
 does make sense, but unless these other people need begin() to be
 fallible I don't see it.

> > > +  /* Add unsigned value V.  */
> > > +  void add_int (unsigned v)
> > > +  {
> > > +    val = iterative_hash_hashval_t (v, val);
> > > +  }
> > 
> > istm this is a great spot to provide a bunch of overloads of just add()
> > and let the compiler pick the appropriate one for your type.
> 
> Sorry I'm not into code obfuscation. With hashing it's far better
> to work with explicit visible types instead of invisible magic.

if that were true I'd expect you'd see lots of cases of people using a
different hash function than the type of the expression being passed,
but a quick look at the later patches didn't show me any of those.
Not repeating the type of something is hardly obfiscation, and in most
cases there's only one sane function to call for a given expression.

but I guess its easy enough to change later if somebody gets really
annoyed by it so whatever.

Trev

> 
> -Andi
Jeff Law July 23, 2014, 3:34 a.m. UTC | #4
On 07/15/14 23:31, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
>
> Some files in gcc, like lto or tree, do large scale incremential hashing.
> The current jhash implementation of this could be likely improved
> by using an incremential hash that does not do a full rehashing
> for every new value added.
>
> This patch adds a new "inchash" class that abstracts the internal
> state of the hash. This makes it easier to plug in new hashes
> and also cleans up the code a bit.
>
> Right now it is just implemented in the same way as the old
> iterative hash in tree.c. The previous iterative hash code
> from tree.c moved into a new separate file. Also I fixed up all
> users to include the new header.
>
> It should not really change any hashing by itself, it's just
> a cleanup at this point.
>
> gcc/:
>
> 2014-07-10  Andi Kleen  <ak@linux.intel.com>
>
> 	* Makefile.in (OBJS): Add inchash.o.
> 	(PLUGIN_HEADERS): Add inchash.h.
> 	* ipa-devirt.c: Include inchash.h.
> 	* lto-streamer-out.c: Dito.
> 	* tree-ssa-dom.c: Dito.
> 	* tree-ssa-pre.c: Dito.
> 	* tree-ssa-sccvn.c: Dito.
> 	* tree-ssa-tail-merge.c: Dito.
> 	* asan.c: Dito.
> 	* tree.c (iterative_hash_hashval_t): Move to ...
> 	(iterative_hash_host_wide_int): Move to ...
> 	* inchash.c: Here. New file.
> 	* tree.h (iterative_hash_hashval_t): Move to ...
> 	(iterative_hash_host_wide_int): Move to ...
> 	* inchash.h: Here. New file.
If there isn't any concrete benefit from using explicit begin/end, I'd 
just initialize the data in the ctor.

Otherwise, this is a fine cleanup.

Presumably there's going to be some better hashing functions coming down 
the pipe?

Jeff
Richard Biener July 23, 2014, 9:37 a.m. UTC | #5
On Fri, Jul 18, 2014 at 3:08 AM, Trevor Saunders <tsaunders@mozilla.com> wrote:
> On Thu, Jul 17, 2014 at 06:36:31AM +0200, Andi Kleen wrote:
>> On Wed, Jul 16, 2014 at 10:40:53PM -0400, Trevor Saunders wrote:
>> >
>> > > + public:
>> > > +
>> > > +  /* Start incremential hashing, optionally with SEED.  */
>> > > +  void begin (hashval_t seed = 0)
>> > > +  {
>> > > +    val = seed;
>> >
>> > why isn't this the ctor?
>>
>> It's standard for hash classes to have explicit begin()/end().
>> All the existing ones I've seen work this way.
>
>  I only know of one vaguelly similar thing
>  http://mxr.mozilla.org/mozilla-central/source/mfbt/SHA1.h#37  which
>  doesn't do that, and a bunch of people doing something doesn't
>  necessarily mean it makes sense.  Now there may be a good reason it
>  does make sense, but unless these other people need begin() to be
>  fallible I don't see it.

I agree with Trevor here.  Please make begin() the constructor.
Btw, what will be the way to plug in an alternative hash function?
That is, there doesn't seem to be a separation of interface
and implementation in your patch (like with a template or a base-class
you inherit from).

Richard.

>> > > +  /* Add unsigned value V.  */
>> > > +  void add_int (unsigned v)
>> > > +  {
>> > > +    val = iterative_hash_hashval_t (v, val);
>> > > +  }
>> >
>> > istm this is a great spot to provide a bunch of overloads of just add()
>> > and let the compiler pick the appropriate one for your type.
>>
>> Sorry I'm not into code obfuscation. With hashing it's far better
>> to work with explicit visible types instead of invisible magic.
>
> if that were true I'd expect you'd see lots of cases of people using a
> different hash function than the type of the expression being passed,
> but a quick look at the later patches didn't show me any of those.
> Not repeating the type of something is hardly obfiscation, and in most
> cases there's only one sane function to call for a given expression.
>
> but I guess its easy enough to change later if somebody gets really
> annoyed by it so whatever.
>
> Trev
>
>>
>> -Andi
Andi Kleen July 23, 2014, 2:27 p.m. UTC | #6
> Btw, what will be the way to plug in an alternative hash function?
> That is, there doesn't seem to be a separation of interface
> and implementation in your patch (like with a template or a base-class
> you inherit from).

Just change the inchash.h include file. The point was to only
change a single place.

Inheritance would need changing everything again for the new type
(unless we came up with hash factories that would likely defeat
virtualization and would be over engineering)

-Andi
Richard Biener July 23, 2014, 6 p.m. UTC | #7
On July 23, 2014 4:27:43 PM CEST, Andi Kleen <ak@linux.intel.com> wrote:
>> Btw, what will be the way to plug in an alternative hash function?
>> That is, there doesn't seem to be a separation of interface
>> and implementation in your patch (like with a template or a
>base-class
>> you inherit from).
>
>Just change the inchash.h include file. The point was to only
>change a single place.

So there will be at most one hash implementation?  Maybe use a namespace instead of a hash then?
So other places can extend it?

Why didn't you replace the tree.c uses BTW?

Thanks,
Richard.

>Inheritance would need changing everything again for the new type
>(unless we came up with hash factories that would likely defeat
>virtualization and would be over engineering)
>
>-Andi
Andi Kleen July 23, 2014, 7:34 p.m. UTC | #8
> So there will be at most one hash implementation? 

One per binary I expect. Modern hash functions are pretty good,
so it's unlikely that someone needs to come up with special
purpose hashes.

I found Bob Jenkins' spooky is rather good for this case (very
large incremential keys), but it is only efficient on 64bit hosts.

So it would need a fallback (like mumurhash2a or the existing one) on 32bit.

Another alternative would be to use CityHash, but that also has multiple
variants, including some machine specific ones (e.g. use the CRC
instructions on SSE2 hosts)

> Maybe use a namespace instead of a hash then?

I don't understand the suggestion.

> So other places can extend it?

Not sure that is needed.

> Why didn't you replace the tree.c uses BTW?

Patches were already quite big, but I'll add it.

-Andi
Andi Kleen July 23, 2014, 7:56 p.m. UTC | #9
> > Why didn't you replace the tree.c uses BTW?
> 
> Patches were already quite big, but I'll add it.

Actually I handled them all in tree.c. Did you
mean something else?

I didn't convert all of tree-ssa-* and dwarf* so far,
and a few other places.  This can be done step by step.

-Andi
Oleg Endo July 24, 2014, midnight UTC | #10
On Wed, 2014-07-23 at 11:37 +0200, Richard Biener wrote:
> On Fri, Jul 18, 2014 at 3:08 AM, Trevor Saunders <tsaunders@mozilla.com> wrote:
> > On Thu, Jul 17, 2014 at 06:36:31AM +0200, Andi Kleen wrote:
> >> On Wed, Jul 16, 2014 at 10:40:53PM -0400, Trevor Saunders wrote:
> >> >
> >> > > + public:
> >> > > +
> >> > > +  /* Start incremential hashing, optionally with SEED.  */
> >> > > +  void begin (hashval_t seed = 0)
> >> > > +  {
> >> > > +    val = seed;
> >> >
> >> > why isn't this the ctor?
> >>
> >> It's standard for hash classes to have explicit begin()/end().
> >> All the existing ones I've seen work this way.
> >
> >  I only know of one vaguelly similar thing
> >  http://mxr.mozilla.org/mozilla-central/source/mfbt/SHA1.h#37  which
> >  doesn't do that, and a bunch of people doing something doesn't
> >  necessarily mean it makes sense.  Now there may be a good reason it
> >  does make sense, but unless these other people need begin() to be
> >  fallible I don't see it.
> 
> I agree with Trevor here.  Please make begin() the constructor.
> Btw, what will be the way to plug in an alternative hash function?
> That is, there doesn't seem to be a separation of interface
> and implementation in your patch (like with a template or a base-class
> you inherit from).

Isn't that what boost / std hash is actually doing?  Maybe something
like the attached example could be an improvement?

As with boost / std hash, the hash function is a template functor that
can be specialized on various types.  The 'incremental_hash' simply
plugs together a hash value combiner and a hash function.  Types that
are not implemented by the hash function (in this example) are caught at
compile time, since implicit conversions do not take place.  However, it
should also possible to write hash functions with a bunch of operator ()
overloads to utilize implicit conversions.

One advantage of this construct is that new types can be added along
with hash function specializations, without the need to touch any of the
existing hashing facilities.

I don't know how/whether this would fit with the already existing hash
map stuff (hash-table.h, hash-map.h).  It seems those don't support user
specified hash functions.

Cheers,
Oleg
Trevor Saunders July 24, 2014, 1:48 a.m. UTC | #11
On Thu, Jul 24, 2014 at 02:00:24AM +0200, Oleg Endo wrote:
> On Wed, 2014-07-23 at 11:37 +0200, Richard Biener wrote:
> > On Fri, Jul 18, 2014 at 3:08 AM, Trevor Saunders <tsaunders@mozilla.com> wrote:
> > > On Thu, Jul 17, 2014 at 06:36:31AM +0200, Andi Kleen wrote:
> > >> On Wed, Jul 16, 2014 at 10:40:53PM -0400, Trevor Saunders wrote:
> > >> >
> > >> > > + public:
> > >> > > +
> > >> > > +  /* Start incremential hashing, optionally with SEED.  */
> > >> > > +  void begin (hashval_t seed = 0)
> > >> > > +  {
> > >> > > +    val = seed;
> > >> >
> > >> > why isn't this the ctor?
> > >>
> > >> It's standard for hash classes to have explicit begin()/end().
> > >> All the existing ones I've seen work this way.
> > >
> > >  I only know of one vaguelly similar thing
> > >  http://mxr.mozilla.org/mozilla-central/source/mfbt/SHA1.h#37  which
> > >  doesn't do that, and a bunch of people doing something doesn't
> > >  necessarily mean it makes sense.  Now there may be a good reason it
> > >  does make sense, but unless these other people need begin() to be
> > >  fallible I don't see it.
> > 
> > I agree with Trevor here.  Please make begin() the constructor.
> > Btw, what will be the way to plug in an alternative hash function?
> > That is, there doesn't seem to be a separation of interface
> > and implementation in your patch (like with a template or a base-class
> > you inherit from).
> 
> Isn't that what boost / std hash is actually doing?  Maybe something
> like the attached example could be an improvement?

I don't really see why that makes it any easier to plug in a different
hash algorithm

> As with boost / std hash, the hash function is a template functor that
> can be specialized on various types.  The 'incremental_hash' simply
> plugs together a hash value combiner and a hash function.  Types that
> are not implemented by the hash function (in this example) are caught at
> compile time, since implicit conversions do not take place.  However, it
> should also possible to write hash functions with a bunch of operator ()
> overloads to utilize implicit conversions.

at that point it seems like its simpler and equivelent to just overload
a global function.

> One advantage of this construct is that new types can be added along
> with hash function specializations, without the need to touch any of the
> existing hashing facilities.
> 
> I don't know how/whether this would fit with the already existing hash
> map stuff (hash-table.h, hash-map.h).  It seems those don't support user
> specified hash functions.

they do, see e.g. graphite-htab.h

Trev

> 
> Cheers,
> Oleg
> 

> typedef unsigned int hashval_t;
> typedef long long HOST_WIDE_INT;
> typedef unsigned int size_t;
> 
> // -----------------------------
> // hash function and some specializations
> 
> struct raw_data
> {
>   const void* data;
>   size_t len;
> 
>   raw_data (const void* d, size_t l) : data (d), len (l) { }
> 
>   template <typename T>
>   explicit raw_data (const T& val) : data (&val), len (sizeof (T)) { }
> };
> 
> template <typename T> struct my_hash;
> 
> template <> struct my_hash<unsigned int>
> {
>   hashval_t operator () (unsigned int val);
> };
> 
> template <> struct my_hash<HOST_WIDE_INT>
> {
>   hashval_t operator () (HOST_WIDE_INT val);
> };
> 
> template <> struct my_hash<raw_data>
> {
>   hashval_t operator () (raw_data val);
> };
> 
> template <> struct my_hash<const void*>
> {
>   hashval_t operator () (const void* val);
> };
> 
> template <typename T> struct my_hash<T*>
> {
>   hashval_t operator () (T* val)
>   {
>     return my_hash<const void*> () (val);
>   }
> };
> 
> template <typename T> struct my_hash<const T*>
> {
>   hashval_t operator () (const T* val)
>   {
>     return my_hash<const void*> () (val);
>   }
> };
> 
> // -----------------------------
> // a possible hash combiner
> struct my_hash_combiner
> {
>   hashval_t operator () (hashval_t a, hashval_t b);
> };
> 
> 
> // -----------------------------
> // generic incremental hash
> 
> template <template<typename> class HashFunction, class HashValueCombiner>
> class incremental_hash
> {
> public:
>   incremental_hash (void) : m_value (0) { }
>   incremental_hash (hashval_t seed) : m_value (seed) { }
> 
>   hashval_t value (void) const { return m_value; }
> 
>   template <typename T> incremental_hash& add (const T& val)
>   {
>     m_value = HashValueCombiner () (m_value, HashFunction<T> () (val));
>     return *this;
>   }
> 
> private:
>   hashval_t m_value;
> };
> 
> // hash function of an incremental_hash is its current value, regardless
> // of the actual hash function used.
> template <template<typename> class Func, class Combiner>
> struct my_hash< incremental_hash<Func, Combiner> >
> {
>   hashval_t operator () (const incremental_hash<Func, Combiner>& val)
>   {
>     return val.value ();
>   }
> };
> 
> 
> // -----------------------------
> // use case
> 
> struct my_object
> {
>   int a, b, c, d;
> };
> 
> typedef incremental_hash<my_hash, my_hash_combiner> my_inc_hash;
> 
> int main (int argc, const char* argv[])
> {
>   my_object some_obj;
> 
>   my_inc_hash hash;
> 
>   hash.add (5U)
>       .add<unsigned int> (234)     // explicit hash func, implicit conversion.
>       .add (HOST_WIDE_INT (2135))
>       .add (raw_data (argv, argc)) // raw data hash
>       .add (raw_data (some_obj))   // iterative_hash_object macro
>       .add (argv)                  // pointer hash
>       .add (argv[0]);              // pointer hash
> 
>   my_inc_hash hash2 (0x5533);
>   hash2.add (123U)
>        .add (hash);  // 'merge' two incremental hashes.
> 
>   return hash2.value ();
> }
Oleg Endo July 24, 2014, 7:05 a.m. UTC | #12
On 24 Jul 2014, at 03:48, Trevor Saunders <tsaunders@mozilla.com> wrote:

> On Thu, Jul 24, 2014 at 02:00:24AM +0200, Oleg Endo wrote:
>> On Wed, 2014-07-23 at 11:37 +0200, Richard Biener wrote:
>>> On Fri, Jul 18, 2014 at 3:08 AM, Trevor Saunders <tsaunders@mozilla.com> wrote:
>>>> On Thu, Jul 17, 2014 at 06:36:31AM +0200, Andi Kleen wrote:
>>>>> On Wed, Jul 16, 2014 at 10:40:53PM -0400, Trevor Saunders wrote:
>>>>>> 
>>>>>>> + public:
>>>>>>> +
>>>>>>> +  /* Start incremential hashing, optionally with SEED.  */
>>>>>>> +  void begin (hashval_t seed = 0)
>>>>>>> +  {
>>>>>>> +    val = seed;
>>>>>> 
>>>>>> why isn't this the ctor?
>>>>> 
>>>>> It's standard for hash classes to have explicit begin()/end().
>>>>> All the existing ones I've seen work this way.
>>>> 
>>>> I only know of one vaguelly similar thing
>>>> http://mxr.mozilla.org/mozilla-central/source/mfbt/SHA1.h#37  which
>>>> doesn't do that, and a bunch of people doing something doesn't
>>>> necessarily mean it makes sense.  Now there may be a good reason it
>>>> does make sense, but unless these other people need begin() to be
>>>> fallible I don't see it.
>>> 
>>> I agree with Trevor here.  Please make begin() the constructor.
>>> Btw, what will be the way to plug in an alternative hash function?
>>> That is, there doesn't seem to be a separation of interface
>>> and implementation in your patch (like with a template or a base-class
>>> you inherit from).
>> 
>> Isn't that what boost / std hash is actually doing?  Maybe something
>> like the attached example could be an improvement?
> 
> I don't really see why that makes it any easier to plug in a different
> hash algorithm

typedef incremental_hash<my_hash, my_hash_combiner> my_inc_hash;
                         ^^^^^^^
                      replace with a different class

True, it's not any 'easier' than replacing any other name in the code.
It makes it easier to mix different type dependent hash functions/
algorithms, though.  The way I understood it, Richard asked how to replace
the hash function for the incremental hash class.

> 
>> As with boost / std hash, the hash function is a template functor that
>> can be specialized on various types.  The 'incremental_hash' simply
>> plugs together a hash value combiner and a hash function.  Types that
>> are not implemented by the hash function (in this example) are caught at
>> compile time, since implicit conversions do not take place.  However, it
>> should also possible to write hash functions with a bunch of operator ()
>> overloads to utilize implicit conversions.
> 
> at that point it seems like its simpler and equivelent to just overload
> a global function.

AFAIR, there are some cases which are difficult to handle with overloads
(implicit conversions etc).  It could be a freestanding template
function that is specialized for the types in question.


> 
>> One advantage of this construct is that new types can be added along
>> with hash function specializations, without the need to touch any of the
>> existing hashing facilities.
>> 
>> I don't know how/whether this would fit with the already existing hash
>> map stuff (hash-table.h, hash-map.h).  It seems those don't support user
>> specified hash functions.
> 
> they do, see e.g. graphite-htab.h

Aha.  I see.


Cheers,
Oleg
Richard Biener July 24, 2014, 8:54 a.m. UTC | #13
On Wed, Jul 23, 2014 at 9:56 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> > Why didn't you replace the tree.c uses BTW?
>>
>> Patches were already quite big, but I'll add it.
>
> Actually I handled them all in tree.c. Did you
> mean something else?

I meant transform the iterative_hash_* _implementations_ to
use the interface.  But maybe I missed sth in the patch(es).

> I didn't convert all of tree-ssa-* and dwarf* so far,
> and a few other places.  This can be done step by step.

Sure.

Can you re-post the adjusted patch introducing inchash.[ch]?

Thanks,
Richard.

> -Andi
diff mbox

Patch

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 187e6b6..4c578b3 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1268,6 +1268,7 @@  OBJS = \
 	hwint.o \
 	ifcvt.o \
 	ree.o \
+	inchash.o \
 	incpath.o \
 	init-regs.o \
 	internal-fn.o \
@@ -3162,7 +3163,7 @@  PLUGIN_HEADERS = $(TREE_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
   tree-parloops.h tree-ssa-address.h tree-ssa-coalesce.h tree-ssa-dom.h \
   tree-ssa-loop.h tree-ssa-loop-ivopts.h tree-ssa-loop-manip.h \
   tree-ssa-loop-niter.h tree-ssa-ter.h tree-ssa-threadedge.h \
-  tree-ssa-threadupdate.h
+  tree-ssa-threadupdate.h inchash.h
 
 # generate the 'build fragment' b-header-vars
 s-header-vars: Makefile
diff --git a/gcc/asan.c b/gcc/asan.c
index b9a4a91..b5c8f91 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -29,6 +29,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "internal-fn.h"
 #include "gimple-expr.h"
 #include "is-a.h"
+#include "inchash.h"
 #include "gimple.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
diff --git a/gcc/inchash.c b/gcc/inchash.c
new file mode 100644
index 0000000..0f8583e
--- /dev/null
+++ b/gcc/inchash.c
@@ -0,0 +1,75 @@ 
+/* Incremential hashing for jhash.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "hashtab.h"
+#include "inchash.h"
+
+/* Borrowed from hashtab.c iterative_hash implementation.  */
+#define mix(a,b,c) \
+{ \
+  a -= b; a -= c; a ^= (c>>13); \
+  b -= c; b -= a; b ^= (a<< 8); \
+  c -= a; c -= b; c ^= ((b&0xffffffff)>>13); \
+  a -= b; a -= c; a ^= ((c&0xffffffff)>>12); \
+  b -= c; b -= a; b = (b ^ (a<<16)) & 0xffffffff; \
+  c -= a; c -= b; c = (c ^ (b>> 5)) & 0xffffffff; \
+  a -= b; a -= c; a = (a ^ (c>> 3)) & 0xffffffff; \
+  b -= c; b -= a; b = (b ^ (a<<10)) & 0xffffffff; \
+  c -= a; c -= b; c = (c ^ (b>>15)) & 0xffffffff; \
+}
+
+
+/* Produce good hash value combining VAL and VAL2.  */
+hashval_t
+iterative_hash_hashval_t (hashval_t val, hashval_t val2)
+{
+  /* the golden ratio; an arbitrary value.  */
+  hashval_t a = 0x9e3779b9;
+
+  mix (a, val, val2);
+  return val2;
+}
+
+/* Produce good hash value combining VAL and VAL2.  */
+
+hashval_t
+iterative_hash_host_wide_int (HOST_WIDE_INT val, hashval_t val2)
+{
+  if (sizeof (HOST_WIDE_INT) == sizeof (hashval_t))
+    return iterative_hash_hashval_t (val, val2);
+  else
+    {
+      hashval_t a = (hashval_t) val;
+      /* Avoid warnings about shifting of more than the width of the type on
+         hosts that won't execute this path.  */
+      int zero = 0;
+      hashval_t b = (hashval_t) (val >> (sizeof (hashval_t) * 8 + zero));
+      mix (a, b, val2);
+      if (sizeof (HOST_WIDE_INT) > 2 * sizeof (hashval_t))
+	{
+	  hashval_t a = (hashval_t) (val >> (sizeof (hashval_t) * 16 + zero));
+	  hashval_t b = (hashval_t) (val >> (sizeof (hashval_t) * 24 + zero));
+	  mix (a, b, val2);
+	}
+      return val2;
+    }
+}
diff --git a/gcc/inchash.h b/gcc/inchash.h
new file mode 100644
index 0000000..071cf92
--- /dev/null
+++ b/gcc/inchash.h
@@ -0,0 +1,86 @@ 
+/* An incremental hash abstract data type.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef INCHASH_H
+#define INCHASH_H 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "hashtab.h"
+
+/* This file implements an incremential hash function ADT, to be used
+   by code that incrementially hashes a lot of unrelated data
+   (not in a single memory block) into a single value. The goal
+   is to make it easy to plug in efficient hash algorithms.
+   Currently it just implements the plain old jhash based
+   incremental hash from gcc's tree.c.  */
+
+extern hashval_t iterative_hash_host_wide_int (HOST_WIDE_INT, hashval_t);
+extern hashval_t iterative_hash_hashval_t (hashval_t, hashval_t);
+
+class inchash
+{
+  hashval_t val;
+ public:
+
+  /* Start incremential hashing, optionally with SEED.  */
+  void begin (hashval_t seed = 0)
+  {
+    val = seed;
+  }
+
+  /* End incremential hashing and provide the final value.  */
+  hashval_t end ()
+  {
+    return val;
+  }
+
+  /* Add unsigned value V.  */
+  void add_int (unsigned v)
+  {
+    val = iterative_hash_hashval_t (v, val);
+  }
+
+  /* Add HOST_WIDE_INT value V.  */
+  void add_wide_int (HOST_WIDE_INT v)
+  {
+    val = iterative_hash_host_wide_int (v, val);
+  }
+
+  /* Hash in pointer PTR.  */
+  void add_ptr (void *ptr)
+  {
+    add (&ptr, sizeof (ptr));
+  }
+
+  /* Add a memory block DATA with size LEN.  */
+  void add (const void *data, size_t len)
+  {
+    val = iterative_hash (data, len, val);
+  }
+
+  /* Hash in state from other inchash OTHER.  */
+  void merge (inchash &other)
+  {
+    val = iterative_hash_hashval_t (other.val, val);
+  }
+};
+
+#endif
diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
index 59781a1..f3d35d6 100644
--- a/gcc/ipa-devirt.c
+++ b/gcc/ipa-devirt.c
@@ -118,6 +118,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "pointer-set.h"
 #include "target.h"
 #include "hash-table.h"
+#include "inchash.h"
 #include "tree-pretty-print.h"
 #include "ipa-utils.h"
 #include "tree-ssa-alias.h"
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index ccb85b6..cf2e9a8 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -44,6 +44,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "function.h"
 #include "diagnostic-core.h"
+#include "inchash.h"
 #include "except.h"
 #include "lto-symtab.h"
 #include "lto-streamer.h"
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index dc30884..300f569 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -33,6 +33,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "bitmap.h"
 #include "hash-map.h"
+#include "inchash.h"
 #include "ipa-prop.h"
 #include "common.h"
 #include "debug.h"
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index c4ec4e5..08fd2fa 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -29,6 +29,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tm_p.h"
 #include "basic-block.h"
 #include "cfgloop.h"
+#include "inchash.h"
 #include "function.h"
 #include "gimple-pretty-print.h"
 #include "tree-ssa-alias.h"
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index b01ad28..8abf68a 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -27,6 +27,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "basic-block.h"
 #include "gimple-pretty-print.h"
 #include "tree-inline.h"
+#include "inchash.h"
 #include "hash-table.h"
 #include "tree-ssa-alias.h"
 #include "internal-fn.h"
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 139ac3b..93314fc 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -30,6 +30,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "hash-table.h"
 #include "tree-ssa-alias.h"
 #include "internal-fn.h"
+#include "inchash.h"
 #include "gimple-fold.h"
 #include "tree-eh.h"
 #include "gimple-expr.h"
diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c
index 7245223..9600e28 100644
--- a/gcc/tree-ssa-tail-merge.c
+++ b/gcc/tree-ssa-tail-merge.c
@@ -192,6 +192,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "stor-layout.h"
 #include "trans-mem.h"
+#include "inchash.h"
 #include "tm_p.h"
 #include "basic-block.h"
 #include "flags.h"
diff --git a/gcc/tree.c b/gcc/tree.c
index 46f8963..6de3835 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -42,6 +42,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "obstack.h"
 #include "toplev.h" /* get_random_seed */
 #include "hashtab.h"
+#include "inchash.h"
 #include "filenames.h"
 #include "output.h"
 #include "target.h"
@@ -4576,56 +4577,6 @@  build_decl_attribute_variant (tree ddecl, tree attribute)
   return ddecl;
 }
 
-/* Borrowed from hashtab.c iterative_hash implementation.  */
-#define mix(a,b,c) \
-{ \
-  a -= b; a -= c; a ^= (c>>13); \
-  b -= c; b -= a; b ^= (a<< 8); \
-  c -= a; c -= b; c ^= ((b&0xffffffff)>>13); \
-  a -= b; a -= c; a ^= ((c&0xffffffff)>>12); \
-  b -= c; b -= a; b = (b ^ (a<<16)) & 0xffffffff; \
-  c -= a; c -= b; c = (c ^ (b>> 5)) & 0xffffffff; \
-  a -= b; a -= c; a = (a ^ (c>> 3)) & 0xffffffff; \
-  b -= c; b -= a; b = (b ^ (a<<10)) & 0xffffffff; \
-  c -= a; c -= b; c = (c ^ (b>>15)) & 0xffffffff; \
-}
-
-
-/* Produce good hash value combining VAL and VAL2.  */
-hashval_t
-iterative_hash_hashval_t (hashval_t val, hashval_t val2)
-{
-  /* the golden ratio; an arbitrary value.  */
-  hashval_t a = 0x9e3779b9;
-
-  mix (a, val, val2);
-  return val2;
-}
-
-/* Produce good hash value combining VAL and VAL2.  */
-hashval_t
-iterative_hash_host_wide_int (HOST_WIDE_INT val, hashval_t val2)
-{
-  if (sizeof (HOST_WIDE_INT) == sizeof (hashval_t))
-    return iterative_hash_hashval_t (val, val2);
-  else
-    {
-      hashval_t a = (hashval_t) val;
-      /* Avoid warnings about shifting of more than the width of the type on
-         hosts that won't execute this path.  */
-      int zero = 0;
-      hashval_t b = (hashval_t) (val >> (sizeof (hashval_t) * 8 + zero));
-      mix (a, b, val2);
-      if (sizeof (HOST_WIDE_INT) > 2 * sizeof (hashval_t))
-	{
-	  hashval_t a = (hashval_t) (val >> (sizeof (hashval_t) * 16 + zero));
-	  hashval_t b = (hashval_t) (val >> (sizeof (hashval_t) * 24 + zero));
-	  mix (a, b, val2);
-	}
-      return val2;
-    }
-}
-
 /* Return a type like TTYPE except that its TYPE_ATTRIBUTE
    is ATTRIBUTE and its qualifiers are QUALS.
 
diff --git a/gcc/tree.h b/gcc/tree.h
index 8fa928b..e454e2d 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4288,9 +4288,6 @@  extern int tree_floor_log2 (const_tree);
 extern unsigned int tree_ctz (const_tree);
 extern int simple_cst_equal (const_tree, const_tree);
 extern hashval_t iterative_hash_expr (const_tree, hashval_t);
-extern hashval_t iterative_hash_host_wide_int (HOST_WIDE_INT, hashval_t);
-extern hashval_t iterative_hash_hashval_t (hashval_t, hashval_t);
-extern hashval_t iterative_hash_host_wide_int (HOST_WIDE_INT, hashval_t);
 extern int compare_tree_int (const_tree, unsigned HOST_WIDE_INT);
 extern int type_list_equal (const_tree, const_tree);
 extern int chain_member (const_tree, const_tree);