Re: [squid-users] cache peer: hit, miss and reject from Amos Jeffries on 2013-09-03 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 04 Sep 2013 16:35:13 +1200

On 4/09/2013 7:14 a.m., Niki Gorchilov wrote:
> Hi.
>
> Is there a way for a cache peer to reject a particular request from
> squid via ICP opcode, other than ICP MISS?
>
> In the current, scenario some url regex acls are passed trough our
> custom cache peer. Both ICP HIT and MISS are used to put proper DSCP
> mark back to the user via qos_flows parent-hit directive.

That is not what ICP HIT/MISS do. The fact of using the peer does the
DSCP value assignment via your config.

> Now we want the cache peer to be able to reject some requests, thus
> forcing squid to serve them directly. Two applications for this
> feature:
> 1. The cache peer knows in advance the requested object is not
> cacheable at all. No need of passing the request via second proxy -
> extra CPU load, extra delay for the user, extra sockets, etc, etc..

This is the meaning of ICP_MISS.

> 2. We know that 50% of the objects in our cache never get requested
> second time, thus only creating load on the system to store and later
> to evict them.

How did you get to that conclusion please?
What Squid version are you using at present?

> So we prefer to be able to cache on second, third,
> etc... request without passing the first requests via the peer at all.

You understand that will possibly halve your caching efficiency right?
turning the 2-request URLs into MISS+MISS+... and making only the
3-times fetched URLs worth caching...

Caching is at its core a tradeoff between storage delays and bandwidth
delays. If you explicitly weight it in the direction of bandwidth delays
by not caching things on first request the benefits drop off
significantly fast.

> Why? Same reasons as above.... ICP is cheap enough for statistics and
> decision making...

This is not possible with ICP as far as I know.

It is also worth noting that ICP does not send any of the HTTP headers
to the peer - so many of the HTTP/1.1 features like Vary, ETag,
conditional requests etc (even HTTP/1.0 Accept negotiation) will fail in
strange ways. You need HTCP protocol for those features to operate
properly between the peers. One URL may be getting two requests but for
different variants.

ICP is simply used for sending boolean value back to the querying proxy
about whether the queried proxy has (or not) a URL _existing_ in cache.

> We've played with other ICP opcodes like ICP_OP_MISS_NOFETCH,
> ICP_OP_DENIED, etc. without positive effect. Either the actual HTTP
> requests keep coming or Squid assumes the cache peer is misconfigured
> and flags it as a dead parent.

Those options are explicitly saying: NOFETCH - "busy, do not use this
peer for a while.", DENIED - "do not use this peer. Absolute Forbidden.".

> Any ideas how to resolve my issue and offload the cache peer by at
> least 50% of the requests it servers currently?

Answer: Do not use cache existence test(s) to solve access control and
routing problems.

I would use an external_acl_type helper to do the calculation about
whether a request was to be cached and set a tag=value on the
transaction. The tag type ACL can then test for this tag and do a "cache
deny". Since you have all traffic

Something like this:

   external_acl_type tagger ttl=0 %URL ... (helper returns "OK
tag=first-seen" or just "OK").
   acl firstSeen external tagger
   acl taggedFirst tag first-seen
   http_accesss deny firstSeen !all
   cache deny !taggedFirst

Amos
Received on Wed Sep 04 2013 - 04:35:21 MDT

This archive was generated by hypermail 2.2.0 : Fri Sep 06 2013 - 12:00:04 MDT