Re: Hash function

From: Oleg Khovayko <olegh@dont-contact.us>
Date: Sun, 24 Oct 2004 23:46:07 -0400

Henrik Nordstrom wrote:

> Please try to keep discussion on the mailinglist.

OK, I'll do.

>
> Does djb2 v2 (xor instead of +) also share this property? I suppose it
> does but I am not sure.. (my initial brainwaves on the problem says it
> doesn't but probably have other similar "bad" properties)

Yes, you absolutely right. It has another bad properties, but has that
any way.
Following program shows this type of weakness for djb2-xor function:

#include <stdio.h>

unsigned h(const char *p) {
  register unsigned rc = 5381;
  unsigned char c;
  while(c = *p++) rc = (rc * 33) ^ c;
  return rc;
}

main() {
  printf("---------------\n%d\n%d\n%d\n%d\n%d\n",
  h("2r2r2r2r"), h("00000000"), h("2r00002r"), h("002r2r00"),
h("2r002r00"));
}

> Fortunately most uses of these hashes in Squid is on strings not
> easily attackable by this property as the strings follow a strict
> pattern or must be verifiable, but there may obviously be some use
> which is exposed.

I not think for especial hacker attack. I afraid - in the some subnets
can be presents "computer-generated" file names or URL's which can
catch squid.
I understand - this is low probability, but in another hand - squid
program is distributed so wide...

By my opinion, real way for avoid "linear trap" - use array of
pseudo-rand() values, like I've introduced in the my function from first
letter.

>
> It is also the case that most if not most of these hash tables in
> Squid is relatively small in size, so there is not much of an
> opportunity of "attack" as even the worst case of all entries in a
> single bucket is not that bad.. and while under attack there is many
> other bottlenecks.

About small arrays size. I think, this isn't good. If table too small,
it means - chains too long. Need to keep chains up to 10..15 elements
size, no more.
Otherwise, search becomes too long. I want seriously discuss about this
topic next time.
I believe, for your program need to use "open addressation" scheme,
where is tables automaticaly changes
size during "rehash" procedure.

>
> 512 words is 4KB, which is considerably larger than the whole hash.o
> object..
>
512 words is 2KB, if assume sizeof(WORD) == 4.

>
> OR
>
> Code segment not readable on the platform in question
>

Hm! This is real and serious contrary argument! I don't know about such
platform, but I agree with you - for portability,
better way to use special pseudorandom array, filled during INIT()
procedure, and don't read code segment.

If do this - I think, we can cut table to 256 words (this is enough any
way), if calculate array index by:

(unsigned char)(rc + c)

instead of previous:
 
(unsigned char)rc + c
Received on Sun Oct 24 2004 - 22:37:34 MDT

This archive was generated by hypermail pre-2.1.9 : Sun Oct 31 2004 - 12:00:02 MST