Re: Hash function from Henrik Nordstrom on 2004-10-24 (squid-dev)

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Mon, 25 Oct 2004 02:14:34 +0200 (CEST)

Please try to keep discussion on the mailinglist.

On Sun, 24 Oct 2004, Oleg Khovayko wrote:

> You offering to use djb2 function, which uses algorithm, as
> same as "hash4" function from your old code.
> I already have explained weakness of this function: it
> sensitive for some character pairs.

Sorry, I dit not notice this as first due to the optimization clutter in
hash4 but you are right.

Does djb2 v2 (xor instead of +) also share this property? I suppose it
does but I am not sure.. (my initial brainwaves on the problem says it
doesn't but probably have other similar "bad" properties)

Fortunately most uses of these hashes in Squid is on strings not easily
attackable by this property as the strings follow a strict pattern or must
be verifiable, but there may obviously be some use which is exposed.

It is also the case that most if not most of these hash tables in Squid is
relatively small in size, so there is not much of an opportunity of
"attack" as even the worst case of all entries in a single bucket is not
that bad.. and while under attack there is many other bottlenecks.

> When you'll replace "hash4" to "djb2", function weakness will be same as
> current hash4. Details & examples you can found in the my previous
> letter [2]. Also, if you interesting, I can to explain you why "k=33"
> is better, than many other values.
>
>> I don't like this property as it assumes both that the code segment is
>> readable and that the hash function is in the middle of a much larger code
>> section.
>
> Yes, this feature seems "at edge", but will works correctly always,
> because of array index will never come over 512, and hence - if 512
> words after label "oleg_h:" will belong to _any_ code or static RO-data
> (not dynamic data), it will be read correctly, and can be used as
> randomization table.
>
> In the your file, hash functions placed in the begin of the file, and total
> bodies of another functions many times overlaps 512 words. And no matter,
> which function code uses for randomization table - "oleg_h" or something
> else.

512 words is 4KB, which is considerably larger than the whole hash.o
object..

size hash.o
text data bss dec hex filename
1698 48 0 1746 6d2 hash.o

> You can (theoretically) catch problems with this function, if:
> - It will be at end of the file hash.c
> AND
> - File hash.o will be linked last in the linker command
> AND
> - Body of the function will be placed at end of the memory page.

Code segment not readable on the platform in question

> If you continue to dislike body segment reading, you can use special garbage
> filled pseudorandom tabe instead of function body.
>
> I only thinked - using code bpdy is nice art feature, for twice use program
> code, which already has been loaded into CPU cache L1.
>
> If you'll decide to use pseudorandom table, I suggest you to fill that table
> by pseudorandom values,
> fetched from rand() function at initialization time, by way like:
>
> static unsigned int *tab;
>
> tab = malloc(512 * sizeof(int));
>
> srand(getpid() ^ time());
> for(unsigned *p = tab; p < tab + 512; p++)
> *p = rand();
>
> This way prevents your program from attempt to analyze code and make any type
> of attacks, based of hash weakness.
>
> --------------------
>
> Also, I see, you use "chain collision resolve" method in the you
> "hash_lookup" function.
>
> I have another suggestion for you:
> I think, for your purposes, need to use "open adresation" schema, instead of
> "chains".
> It will save memory/cpu overhead, which you have now, every time when you do
> malloc/free for small chain element.
Received on Sun Oct 24 2004 - 18:14:46 MDT

This archive was generated by hypermail pre-2.1.9 : Sun Oct 31 2004 - 12:00:02 MST