Click to See Complete Forum and Search --> : Hash functions...


Minime80
11-04-2001, 08:44 PM
What're some operations to use on ascii values that would make a decent hash function for peoples' names?

jemfinch
11-04-2001, 11:37 PM
Here's an excellent one, in C:


uint32 cdb_hashadd(uint32 h,unsigned char c)
{
h += (h << 5);
return h ^ c;
}

uint32 cdb_hash(char *buf,unsigned int len)
{
uint32 h;

h = CDB_HASHSTART;
while (len) {
h = cdb_hashadd(h,*buf++);
--len;
}
return h;
}


There are other, simpler ones, like simply multiplying the hash value by 31, 33,
or 37 and adding the ascii value of the character -- perl uses one of those
schemes, iirc. The one above is from DJB's CDB package, and works well for me.

Jeremy

Minime80
11-05-2001, 08:36 PM
Looks nice. I just have a couple questions though. What's CDB_HASHSTART? Should the line in cdb_hash where you call cdb_hash add be changed to

h += cdb_hashadd(...);

cause the way it's written now looks like your just going to end up with the value that cdb_hashadd came up with for the last character. Oh, and if the hash table is of length "LENGTH". Would cdb_hash return "h % LENGTH" instead of just h?

jkm
11-05-2001, 08:54 PM
depending on the size of the table, md5 or another cryptographically secure hash algorithm would be a good idea, since collisions would be next to improbable.

jemfinch
11-06-2001, 03:34 AM
Originally posted by jkm:
depending on the size of the table, md5 or another cryptographically secure hash algorithm would be a good idea, since collisions would be next to improbable.

MD5 is slow, though, and with any sane-sized hashtable, its cryptographic
security would be modded away -- any of the above hashes will get equivalent
distributions.

Jeremy

jemfinch
11-06-2001, 03:36 AM
Oh, CDB_HASHSTART is 5381.

Jeremy

pinoy
11-06-2001, 04:43 AM
you might also want to make your modulus/table size a prime number.

jemfinch
11-06-2001, 08:50 AM
Do note, however, that the hash function from cdb (the one I actually posted
code for) works particularly well for non-prime moduluses. In cdb, it's used
with a 256 modulus, in fact.

Jeremy

pinoy
11-06-2001, 04:36 PM
I must try that sometime. Is that DJB the same guy who wrote QMail? I've read QMail before, it was pretty tight and secure, but hard to follow with all the hundreds of little files.