Re: Store_url_rewrite for squid 3+

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Thu, 06 Sep 2012 12:04:23 -0600

On 09/05/2012 06:58 PM, Amos Jeffries wrote:
> On 06.09.2012 11:58, Eliezer Croitoru wrote:
> We can pause there for the infrastructure to look fine before moving on
> to the store details. I've been waiting on assistance from Henrik or
> Alex on that for a while. They are the ones who know the answers to your
> questions below AFAIK.

FWIW, I have not reviewed the store_url_rewrite code in Squid2 so I
cannot answer the questions related to how it was done. I can suggest
ways of doing this in Squid3, but since somebody already investigated
all the alternatives, it would be better to hear the summary of the
Squid2 implementation (as it relates to Store) before diving into Squid3
development.

The biggest question for me is why Squid2 code was storing multiple URLs
with the cached object (if it was). Why cannot Store just work with the
[rewritten] URL given to it and ignore the fact that some [store] URLs
originated from some other [real] URLs? Are we trying to support going
from a store_url_rewrite config back to regular config without losing
some of the cached objects?

>> 2. Research the workflow of storing objects in memory and store and
>> introduce psudo for a new workflow of storing objects to avoid bad
>> effects on cache objects usage in any form that can be.
>> - I do know that squid uses some hash look-up and I have seen in the
>> things about it.
>> - as far I understood from the code:
>> client_request builds the request of the http object.
>> creates a mem-object and on the way creates a checksum.
>> a transfer from of the mem-object to a "store" happens.
>> if a store rebuild happens it takes all of the data from the file in
>> the store.
>>
>> ? question how cachemgr gets the list of urls in memory?

You might be confusing "cache manager" (the thing that responds to
"squidclient mgr:info" requests) with Store. Also, you should not think
in terms of memory (RAM) because some objects are only cached on disk.
It is best to think of Store as a collection of stored objects, ignoring
their particular location (memory or disk) to the extent possible.

Store can get a list of cached objects by iterating through store_table
and other store indexes. In general, you should not assume that it is
possible to get a list of all cached URLs in any efficient/practical
fashion because not all in-RAM indexes store URLs. It is only possible
to get an answer to the following question:

  * Is a response with cache key K likely to be in Squid cache now?

Where cache key is a hash computed over the request method, request URI,
and other properties.

>> I will look at it later but if someone have solid knowledge on how
>> the store routing was or implemented before i'm rushing into the code
>> every piece of info will help me when looking into it.

The Store is too big and complex of an API to accurately describe in an
email IMO. I would be happy to answer specific questions about the stuff
I know, but you may have to research how things work as there is no
comprehensive documentation yet.

Another complication is that such fundamental Squid2 Store feature as
store_table needs to be removed but it has not been completely removed
from Squid3 yet, so there is some [older] code that relies on it and
some [newer] code that tries hard to stay away from it, all while doing
the same kind of operations.

Finally, the whole Store class hierarchy is ugly to a fault. It needs to
be split into more independent classes instead of everybody and the
kitchen sink inheriting from Store, hiding the intended boundaries among
"store manager", "memory storage manager", "disk storage manager",
"cache_dir manager", etc.

Good luck,

Alex.
Received on Thu Sep 06 2012 - 18:04:53 MDT

This archive was generated by hypermail 2.2.0 : Fri Sep 07 2012 - 12:00:10 MDT