Re: ICP & HEAD requests from Chris Wedgwood on 1998-09-07 (squid-dev)

From: Chris Wedgwood <chris@dont-contact.us>
Date: Tue, 8 Sep 1998 09:06:52 +1200

[Umm... this is a bit muddled. I started with a brief message and
part of my soon-to-come brain dump sort of evloved out of it]

On Mon, Sep 07, 1998 at 10:01:20AM -0600, Alex Rousskov wrote:

> Agree. However, storing partial objects and merging parts as they
> come is complicated since it goes agains the basic Squid assumption
> about the object being a single continuous piece of data identified
> by its URL.

Requiring squid allow store objects to be n byte-ranges starting from
s_n and ending e_n, where n can be very large, and placing no
restrictions on s_n or e_n would indeed be very hard.

But, if we were to either, limit n to something `reasonable', which
presumably will have to be user-definable, perhaps defaulting to 8 or
if we require `s_n == 0 mod HUNKSIZE' where HUNKSIZE might default to
32K or so, then things might not be so bad.

However, all this make storage more complex (although if we assume
the fs support sparse files we can cheat a bit), and it also makes an
design criteria for squidFS all that much more complex.

Can anybody tell me by looking at their logs or whatever, how often
byte-ranges are requested (byte-ranges which do not cover the entire
document that is)?

> The problem with ranges is that some applications rely on their
> efficient support. We had complaints from content providers who put
> huge .pdf documents on their sites, and then are surprised why it
> takes a long time for Acrobat Reader plugin to access a random page
> withing a document.

Ooo... I wasn't aware ranges were used all that often. Indeed, if
this is the case, it may well need looking at sometime. I know IE4
uses ranges to restart transmissions that have been truncated,
although in this case, squid does the right thing by retrieving the
entire document.

> There are a few heuristics that we can add to Squid to improve the
> performance in similar scenarios, but nothing will work in 100% of cases
> until storage of partial objects is implemented.

I think this is easier said that done, I'd love to hear what other
people think is viable, it looks like it could be fairly non-trivial
to me.

I've got some ideas on how we can possible make squid scale better or
large machines with multiple processors using schocastic load
balancing by spreading the FDs over mutliple threads/processors and
alternating who gets to accept new connections. I'm not entirely sure
how to handle effeciently ICP is this case though...

I also have a sketchy design of a way to reduce ICP traffic between
large sites (but not necessarily within them) by have squids
proxy/concentrate ICP packets (its more complex than this, and still
a somewhat naive approach when compared to Summary Cache and perhaps
Digests). The concept was originally design to solve another issue,
that is peers should be allow lies about hits when they thing they've
given more data to a site than they have received, so it doesn't
completely solve many of the problems with ICP, but nor need it
require you maintain state of what another cache has.

The critical part of this design though is, how often are ICP packets
dropped? If this is even 3% or so, it could really start to degrade
peering success (and make this approach pointless), it also assumes
local traffic and CPU isn't a fairly limited resource.

Any ideas on when squid-1.2 will freeze and we're allowed to start
breaking stuff left-right and center in squid-1.3?

-Chris
Received on Tue Jul 29 2003 - 13:15:53 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:55 MST