Minime80
11-04-2001, 08:44 PM
What're some operations to use on ascii values that would make a decent hash function for peoples' names?
|
Click to See Complete Forum and Search --> : Hash functions... Minime80 11-04-2001, 08:44 PM What're some operations to use on ascii values that would make a decent hash function for peoples' names? jemfinch 11-04-2001, 11:37 PM Here's an excellent one, in C: uint32 cdb_hashadd(uint32 h,unsigned char c) { h += (h << 5); return h ^ c; } uint32 cdb_hash(char *buf,unsigned int len) { uint32 h; h = CDB_HASHSTART; while (len) { h = cdb_hashadd(h,*buf++); --len; } return h; } There are other, simpler ones, like simply multiplying the hash value by 31, 33, or 37 and adding the ascii value of the character -- perl uses one of those schemes, iirc. The one above is from DJB's CDB package, and works well for me. Jeremy Minime80 11-05-2001, 08:36 PM Looks nice. I just have a couple questions though. What's CDB_HASHSTART? Should the line in cdb_hash where you call cdb_hash add be changed to h += cdb_hashadd(...); cause the way it's written now looks like your just going to end up with the value that cdb_hashadd came up with for the last character. Oh, and if the hash table is of length "LENGTH". Would cdb_hash return "h % LENGTH" instead of just h? jkm 11-05-2001, 08:54 PM depending on the size of the table, md5 or another cryptographically secure hash algorithm would be a good idea, since collisions would be next to improbable. jemfinch 11-06-2001, 03:34 AM Originally posted by jkm: depending on the size of the table, md5 or another cryptographically secure hash algorithm would be a good idea, since collisions would be next to improbable. MD5 is slow, though, and with any sane-sized hashtable, its cryptographic security would be modded away -- any of the above hashes will get equivalent distributions. Jeremy jemfinch 11-06-2001, 03:36 AM Oh, CDB_HASHSTART is 5381. Jeremy pinoy 11-06-2001, 04:43 AM you might also want to make your modulus/table size a prime number. jemfinch 11-06-2001, 08:50 AM Do note, however, that the hash function from cdb (the one I actually posted code for) works particularly well for non-prime moduluses. In cdb, it's used with a 256 modulus, in fact. Jeremy pinoy 11-06-2001, 04:36 PM I must try that sometime. Is that DJB the same guy who wrote QMail? I've read QMail before, it was pretty tight and secure, but hard to follow with all the hundreds of little files. justlinux.com
Copyright Internet.com Inc. All Rights Reserved. |