Re: Storing partial responses from Joe Cooper on 2000-12-12 (squid-dev)

From: Joe Cooper <joe@dont-contact.us>
Date: Tue, 12 Dec 2000 02:45:32 -0600

Hi Robert,

I won't volunteer to 'help' per se, but I will volunteer to peer
interestedly over your shoulder the whole time. I might even point at
stuff, and say "Hmmmm" on occasion.

Ok, that's my way of saying this sounds like an interesting area of
work, and I was just looking at that part of the code this past weekend
(which hasn't been touched since 2.2STABLE5+patch, as far as I could
tell) for a client who had a curiosity about byte range requests and
Squid. And what I'll actually do in the way of helping is try to keep
up with your development and help you where I can with testing and idea
bouncing. I might even attempt to write a line or two of code.

Let me know if there's anything I can do to help get things rolling.

First thoughts:

Requires a disk format or swap.state change, probably. (Whether we
break it up or put the pieces all in one place.)

Breaks the storetree stuff that is going to be merged in from the
squidng project. storetree does a hash of the URL for storage
indexing...and keeps no record of what was just stored. If the object
is not complete, reiser_raw won't know that and will serve it
incomplete. So a new field would also be needed in reiser_raw. Which I
guess applies to the standard object store as well...so chalk that one
up to "necessary change" for version DEVEL2.5.

Possibilities:

The idea of rewriting files and expiring the 'pieces' as new pieces come
in, until we have a complete object. Adds significant write overhead,
but keeps the index simple and objects end up being as large as they can
be...reducing read and seek overhead. I think we should avoid
fragmenting of the object store, if possible, for performance reasons.
But that's just a hunch. I could be wrong, since range requests are
being used on mostly big objects anyway, I guess? But this plan doesn't
account for 'holes' in the ranges being requested. How likely is that?
And would it be wise to just accept having to fetch the whole object
or the data in between two points in such a case in order to avoid the
complexity of having several separate parts of one object at different
locations in the store?

I'll stop talking for now until I actually understand what I'm talking
about.

Robert Collins wrote:

> Hi everyone,
> I've added a new branch on sourceforge for working on storeing and returning HITS from partial responses. I don't know how fast
> I'll move it along :]. If anyone wants to collaborate on it then fantastic.
>
> for reference, the tag is storepartial.
>
> My rough approach plan is to
> a) get strong validation working. (if it's not already)
> b) figure out how best to store multiple sections from a URL in the object store. IE should we consider each non-overlapping range a
> separate URI? Or perhaps store a series of sections in the ondisk object with common details in the meta data and then 1..n sections
> of defined length and offset? I haven't put a great deal of effort into this, and I'm hoping to avoid invalidating the existing
> caches when it happens.
> c) get squid caching the partial responses and serving hits that are completely covered by in-cache range data.
> d) look into extra optimisations (for example, if we have a partial response in cache, ask the origin for a HEAD, and if a strong
> validator comparison succeeds send one or more range requests to the origin, fulfilling the client from the store and from the
> origin.)
>
> Rob

                                   --
                      Joe Cooper <joe@swelltech.com>
                  Affordable Web Caching Proxy Appliances
                         http://www.swelltech.com
Received on Tue Dec 12 2000 - 02:33:20 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:03 MST