Re: How long is a domain or url can be?

From: Eliezer Croitoru <eliezer_at_ngtech.co.il>
Date: Wed, 30 Apr 2014 21:12:07 +0300

On 04/30/2014 11:52 AM, Henrik Nordström wrote:
> Unless it has been fixed the UFS based stores also have an implicit
> limit on cached entries somewhat less than 4KB (whole meta header need
> to fit in first 4KB). Entries failing this gets cached but can never get
> hit.
Then StoreID helps a bit with that..
Now it's understood why some urls with the "?" in them do not cache well
sometimes :P

>> >DNS defines X.Y.Z segments as being no longer than 255 bytes*each*.
> For Internet host names the limits are 63 octets per label, and 255
> octects in total including dot delimiters.

This is indeed what I have been reading in the RFC and it makes the
regex for domain simpler to define.
 From what I have seen 2-3KB of request size was the high limit of the
size that have been used.
I assume that this is what is happening now in the current data sizes
over the network.
Every once in a while the data size goes up and the url should also
since they will be used by bigger sizes hash algorithms.
It was started in smaller and then crc16 crc32 mdX md5 sha1 sha512...etc..

So for now a url blacklist should be at-least 4KB with size but I think
when jumping\doubling 4KB it's not such a big jump to 8KB.
The main issue I was thinking was between using one field of the DB with
X size or other one which has indexes.

For now I have used mysql TEXT which doesn't have indexes but only the
first query takes more then 0.00 ms.

I have tried couple key-value DB's and other DB's but it seems like all
of them are having some kind of a step which is the slowest and then it
run's fast.

I have mysql Compared to key-vaule and the main differences are the
on-disk size of the DB which is important if there is a plan to filter
many many specific urls and not based only on patterns.

Amos:(or anyone else) since you patched squidguard, maybe you do have
basic understanding with it's lookup algorithm?
I started reading the code of SquidGuard but then all of a sudden I lost
my way in it and things got a bit complicated (for me) to understand how
they do it.(hints..)

Thanks,
Eliezer

> Regards
> Henrik
Received on Wed Apr 30 2014 - 18:13:16 MDT

This archive was generated by hypermail 2.2.0 : Thu May 01 2014 - 12:00:15 MDT