Re: [squid-users] HTTP Cache General Question from Amos Jeffries on 2009-10-08 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 09 Oct 2009 18:26:47 +1300

CC'ing squid-dev so the other developers can get a look and comment

Mark Schall wrote:
> Thank you for the information.
>
> One more question:
>
> We're looking at researching if it is possible to cache P2P data in an
> HTTP Cache (purely research). What we have assumed is that if we were
> to send an HTTP request to an IP address (a diff peer) (1.2.3.4) and
> the header would have a URI that does not correlate with the IP
> address that the Web Cache would store based on the URI in the header.
> This way if we sent to a diff peer (5.6.7.8) with the same URI in the
> header, we'd get back the cached data.
>
> I know this big assumption, and would change our approach if not true,
> but it seems logical to be able to work this way. Do you know if
> Squid works in this way?

I've given this a small amount of thought over the last few years. And
bounced the idea off Adrian after hours at a conference last year.

We came to the conclusion that it would be a very difficult thing to do
in Squid as the code currently stands.

It is theoretically possible and relatively easy to add a P2P port and
an engine to handle the requests arriving. Also to cache the objects
similar to any others, URI can be provided as you say, or even created
as needed directly out of P2P meta data in the .torrent case.

The major blocker problem is that Squid cache storage does not yet
support partial ranges of objects. This is a big problem for HTTP and
becomes a critical issue if P2P downloads are added. It means
essentially that the segments of P2P files cannot be fetched in parallel
from multiple sources. Breaking the best benefits P2P would bring to
Squid. The P2P files could still be fetched linearly by Squid however.

A lesser major issue is the sheer size of P2P objects and traffic.
Caches are already filled with a lot of content from HTTP alone, adding
P2P requests to that would add a major burden on storage space. This is
more an obstacle for the admin however.

Beyond that there is a lot of small pieces of work to make Squid capable
of contacting P2P servers and peers, intercept seed file requests, etc.

>
> We also think it'd be possible for the cache to take the HTTP header,
> check to see if the URI is in the cache, and if not send the header to
> the domain in the header.

This is how proxies already work at a fundamental level.

>
> Thank you again
>
> Mark Schall
> Michigan State University
> CSE Graduate Student
>
>
>
> On Wed, Oct 7, 2009 at 11:17 PM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
>> On Wed, 7 Oct 2009 11:24:29 -0400, Mark Schall <schallm2_at_msu.edu> wrote:
>>> Hi,
>>>
>>> My name is Mark Schall. I am a Master student at Michigan State
>>> University. I am working with a group, trying to work with HTTP
>>> caches. We were wondering if, in general do HTTP caches work by
>>> caching data based on the IP addresses or by the URI of the HTTP
>> In general? I wont dare to guess. Too many ways to do it and too many
>> different software caches using those ways.
>>
>> Squid in particular stores them by hash. Older versions used hash of URL.
>> Newer 2.x use Hash of URL + some Vary: headers and stuff.
>>
>>> request. It seems that using IP addresses would be the most secure
>>> means of caching, but the URI seems logical for multiple server
>>> websites.
>> I assume by 'secure' you mean 'secure against data leaks'. There is nothing
>> inherently secure about caching in the first place. The cache admin always
>> has access to the cached data in intermediary traffic.
>> What security there is in caching is built on a trust between cache admin
>> and website admin. The website admin trusts that the cache admin will obey
>> the CC headers. The cache admin trusts that the website admin will set the
>> headers correctly (private) to protect sensitive information and also
>> inform how often objects get replaced etc.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE7 or 3.0.STABLE19
   Current Beta Squid 3.1.0.14

Received on Fri Oct 09 2009 - 05:26:54 MDT

This archive was generated by hypermail 2.2.0 : Tue Oct 13 2009 - 12:00:05 MDT