Re: [squid-users] "concurrency" attribute external_acl_type from Chris Robertson on 2009-04-06 (squid-users)

From: Chris Robertson <crobertson_at_gci.net>
Date: Mon, 06 Apr 2009 16:27:40 -0800

louis gonzales wrote:
> List,
> 1) for the "concurrency" attribute does this simply indicate how many
> items in a batch will be sent to the external helper?
>

No. There is no such thing as a "batch" in HTTP.

> 1.1) assuming concurrency is set to "6" for example, and let's assume
> a user's browser session sends out "7" actual URL's through the proxy
> request - does this mean "6" will go to the first instance of the
> external helper, and the "7th" will go to a second instance of the
> helper?
>

Yes.

> 1.1.1) Assuming the 6 from the first part of the batch return "OK" and
> the 7th returns "ERR", will the user's browser session, render the 6
> and not render the 7th?

Again you use batch. Evey request passed to a helper that is not
blocked (by an http_access deny or a http_reply_access deny) will be
passed to the browser.

> More importantly, how does Squid know that
> the two batches - one of 6, and one with 1, for the 7 total, know that
> all 7 came from the same browser session?
>

It doesn't.

> What I have currently:
> - openldap with postgresql, used for my "user database", which permits
> me to use the "auth_param squid_ldap_auth" module to authenticate my
> users with.
> - a postgresql database storing my acl's for the given user database
>
> Process:
> Step1: user authenticates through squid_ldap_auth
> Step2: the user requested URL(and obviously all images, content, ...)
> get passed to the external helper
>

This is where you go awry. The user requested URL
(http://www.google.com) will be passed to the helper. If that URL
results in an OK being passed back and nothing else prevents this
request, the contents of that URL will be passed back to the browser.
The browser will interpret the web page, and make a number of additional
requests (in this example, that would include the Google logo and some
sourced JavaScript). Each of those requests will be handled in a like
manner (perhaps resulting in still additional requests, such as
JavaScript requesting images).

> Step3: external helper checks those URL's against the database for the
> specific user and then determines "OK" or "ERR"
>
> Issue1:
> How to have the user requested URL(and all images, content, ...) get
> passed as a batch/bundle, to a single external helper instance, so I
> can collectively determine "OK" or "ERR"
>

This is impossible due to the nature of the HTTP protocol. There is no
such thing as a "batch" or a "session". Cookies were implemented to
bypass this on a per-site basis.

> Any ideas? Is the "concurrency" attribute to declare a maximum number
> of "requests" that go to a single external helper instance?

Concurrency is the maximum number of :*simultaneous* requests that a
single external helper will handle.

> So if I
> set concurrency to 15, should I have the external helper read count++
> while STDIN lines come in, until no more, then I know I have X number
> in a batch/bundle?
>

No. There would be no way to know that the 15 requests are in any way
related, the helper would allow or deny all 15 based on whether one of
the requests is or is not okay and the helper would block waiting for a
full queue of 15 requests before handling any.

> Obviously there is no way to predetermine how many URL's/URI's will
> need to be checked against the database, so if I set concurrency to
> 1024, "presuming to be high enough" that no single request will max it
> out, then I can just count++ and when the external helper is done
> counting STDIN readlines, I can process to determine "OK" or "ERR" for
> that specific request?
>

Raising the number to 1024, would (hopefully by now, obviously) be a
even worse idea.

> Issue2:
> I'd like to just have a single external helper instance start up, that
> can fork() and deal with each URL/URI request,

That is exactly what concurrency expects.

> however, I'm not sure
> Squid in its current incarnation passes enough information OR doesn't
> permit specific enough passback (from the helper) information, to make
> this happen.
>

A concurrent-enabled helper is passed (and is expected to pass back) a
"query channel" tag to identify which response corresponds to which request.

> Any deeper insights, would be tremendously appreciated.
>
> Thanks,
>

Chris
Received on Tue Apr 07 2009 - 00:27:48 MDT

This archive was generated by hypermail 2.2.0 : Tue Apr 07 2009 - 12:00:02 MDT