Re: [squid-users] Re: Accelerating Proxy options?

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 20 Apr 2011 12:59:47 +1200

 On Tue, 19 Apr 2011 13:31:38 -0700, Linda Walsh wrote:
> Amos Jeffries wrote:
>> On Mon, 18 Apr 2011 18:30:51 -0700, Linda Walsh wrote:
>>> [wondering about squid accelerator features such as...]
>>> 1) Parsing fetched webpages and looking for statically included
>>> content
>>> and starting a "fetch" on those files as soon as it determines
>>> page-requisites
>> Squid is designed not to touch the content. Doing so makes things
>> slower.
> ----
> Um, you mean: "Doing so can often make things slower." :-)

 Almost always. ;-)

>
> It depends on the relative speed of CPU speed (specifically, the
> CPU speed of the processor where squid is being run) vs. the external
> line speed. Certainly, you would agree that if the external line
> speed is 30Bps, for example, Squid would have much greater latitude
> to "diddle" with the content before a performance impact would be
> noticed.
>
> I would agree that doing such processing "in-line" would create
> a performance impact, since even right now, with no such processing
> being
> done, I note squid impacting performance by about 10-30% over a
> direct
> connection to *fast* sites. However, I would only think about doing
> such work outside of the direct i/o chain via separate threads or
> processes.

 Not easily possible in Squid. The I/O chain for the body is currently
 almost exactly read FD 1 -> write FD 2.
 Doing any changes at all inline means making it:
  read FD 1 -> buffer -> scan -> process -> copy result -> write FD 2.

 About 2x-3x delay even if there is no change to be made.

 We get away with ~5% lag from chunked encoding because the chunks are
 predictable in advance and the intermediate bytes can drop down to that
 read->write efficiency within chunks.

>
> Picture this: I (on a client sys) pull in a web page. At same
> time
> I get it, it's handed over to a separate process running on a
> separate core

 read -> copy to reader thread buffer -> copy to processing thread
 buffer -> copy to result output buffer (maybe) -> coy to writer thread
 buffer -> write.

 2x slowdown *on top of* the above processing scan lags. This exact case
 of multiple copying is one of two reasons we do not have threading in
 Squid.
  We are working instead towards the Apache model of one process fully
 handling a request transaction with IPC callouts to linked workers which
 can provide shared details as needed. Threads possibly at the
 sub-transaction layer to handle things, decide on a case-by-case basis.
 More in the wiki under SmpSupport.

> that begins processing. Even if the server and client parse at the
> same
> speed, the server would have an edge in formulating the "pre-fetch"
> requests simple because it's on the same physical machine and doesn't
> have any client-server latency). The server might have an additional
> edge since it would only be looking through fetched content for
> "pre-fetchables" and not concerning itself with rendering issues.
>
>> There are ICAP server apps and eCAP modules floating around that
>> people have written to plug into Squid and do it. The only public one
>> AFAICT is the one doing gzipping, the others are all proprietary or
>> private projects.
> ---
> Too bad there is no "CSAN" repository akin to perl's CPAN as well
> as a seemingly different level of community motivation to adding to
> such
> a repository.
>
>
>
>
>>> 2. Another level would be pre-inclusion of included content for
>>> pages
>>> that have already been fetched and are in cache. [...]
>> ESI does this. But requires the website to support ESI syntax in the
>> page code.
> ---
> ESI? Is there a TLA URL for that? ;-)
>

 It is mostly server-side stuff.
 http://en.wikipedia.org/wiki/Edge_Side_Includes covers the ESI syntax
 etc.

 When the component is built into Squid (--enable-esi) it is pretty much
 automatic on reverse-proxy requests.

 The only applicable config in Squid is
 http://wiki.squid-cache.org/Features/Surrogate and for all
 well-configured reverse-proxy the visible_hostname (being a unique
 public FQDN) is the default advertised surrogate ID anyway.

>
> Anyway, just some wonderings...
> What will it take for Sq3 to get to the feature level of Sq2 and
> allow,

 What we are missing in a big way is store-URL and location-URL
 re-writers.

 And speed ... we have a mixed bag of benchmarks for the relative speed
 of 3.2 and 2.7. They appear pretty much equal or 3.2 ahead for some
 simple tests now. Some common components (ie ACLs and DNS) need a bit
 more speed optimization before 3.2 is ahead in general.

 ETag variants, collapsed forwarding, and background revalidation
 (leading to stale-while-revalidate support) would be nice to improve
 speed, but are not essential for the deprecation of 2.7.

> for example, caching of dynamic content?

 Squid all have that. It was only ever a configuration default. Though
 Squid HTTP/1.1 caching compliance is dodgy with releases mid-series 2.6
 and older.

>
> Also, what will it take for Sq3 to get full, included HTTP1.1
> support?

 Squid-3.1 does HTTP/1.1 to servers by default. Squid-3.2 to clients by
 default as well (AND full chunked encoding support for persistent
 connections).

>
> It __seems__ like, though it's been out for years, it hasn't made
> much progress on those fronts. Are they simply not a priority?
>
> Especially getting to the 1st goal (Sq3>=Sq2), I would think, would
> consolidate community efforts at improvement and module construction
> (e.g. caching dynamic content like that from youtube and the
> associated wiki directions for doing so under Sq2, which are
> inapplicable to Sq3)...

 I've had a handful of people stick their hands up to do this over the
 last year or two. Pointed them at the squid-2 patches which need
 adjusting to compile and work in squid-3 code. Never to hear from them
 again. :(

> (chomping at bit, for Sq2 to become obviated by Sq3)...

 Me too.

 Amos
Received on Wed Apr 20 2011 - 00:59:53 MDT

This archive was generated by hypermail 2.2.0 : Wed Apr 20 2011 - 12:00:03 MDT