Re: Store_url_rewrite for squid 3+

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 07 Sep 2012 08:10:00 -0600

On 09/07/2012 06:20 AM, Eliezer Croitoru wrote:
> On 09/06/2012 09:04 PM, Alex Rousskov wrote:

>> The biggest question for me is why Squid2 code was storing multiple
>> URLs with the cached object (if it was).
>> Why cannot Store just work with the [rewritten] URL given to it and
>> ignore the fact that some [store] URLs originated from some other
>> [real] URLs?

> I found the answer in the documentation at: 03_major_componenets:
> "The Storage Manager is the glue between client and server
> sides. Every object saved in the cache is allocated a
> StoreEntry structure. While the object is being
> accessed, it also has a MemObject structure."

I do not understand how the above quote answers my question. Moreover,
the above text is stale -- Squid3 (Rock Store and shared memory caches
specifically) does not allocate StoreEntry for every object saved in the
cache.

Furthermore, as far as I can tell, current Squid3 code does not store
the request URL at all (please correct me if I am wrong). This may imply
that there is no [pressing] need to store the rewritten URL either (or
at least we should have a clear understanding of why it needs to be
stored, and storing it may be viewed as a separate project/improvement).

> so I think the duplication was to preserve this structure and prevent
> major api changes.
> as far I get into the depth of the code I see how it's reasonable to
> make this decision if you compare the loss and benefits .
> loosing a few single(sometimes it's not) bytes of GB is cheap compared
> to depth development time.

I do not know what benefits you are talking about. Why do we need to
store the rewritten URL? In other words, what will break if we do not
store the rewritten URL? Again, I am not saying that storing URLs is
wrong -- I just want to understand why we need to do that.

One possible use is consistency checks. If we store the request URL, the
hit serving code can double check that the current request URL is the
same as the stored request URL. However, those checks do not explain why
we need to store two URLs (rewritten and original) and they should be
viewed as a separate improvement/project outside of your work scope.

>> Store can get a list of cached objects by iterating through store_table
>> and other store indexes. In general, you should not assume that it is
>> possible to get a list of all cached URLs in any efficient/practical
>> fashion because not all in-RAM indexes store URLs. It is only possible
>> to get an answer to the following question:
>>
>> * Is a response with cache key K likely to be in Squid cache now?
>>
>> Where cache key is a hash computed over the request method, request URI,
>> and other properties.

> this is what I remember.
> so for the new development there should be an option to do that but also
> do the rewriting of the url before checking that or more practically to
> change the cache key calculations if there is a store_url present for
> the request.

I am not sure what "option" you are referring to in the above. The
Store::get(key) API I have described is not optional -- it is the
primary way of detecting a hit.

The URL rewriting must happen before or during Store key calculation.

> I do have an approach that I want to check and Its' based on finding
> all\specific mem_object operations and object creation in the code.
> now I'm struggling to juggle find and mark the points.
>
> answering the double url thing
> In 2.7 the feature used the storeswap_meta to add the storeurl string
> and it's simplified things.

Yes, if we need to store URL(s), we should use the StoreMeta API.

Why do we need to store URL(s)?

> I noticed "STORE_META_STOREURL" still in the TLV headers (probably to
> support older cache object structure versions) so i will try to use it..
> for some testing purposes.

STORE_META_STOREURL is unused in Squid3 AFAICT.

> I have tested and added basic tests to make sure that the storeurl is
> being written and used and it's not hearts the any current cache objects
> or stuff.
> any thoughts?

Why do we need to store it?

Thank you,

Alex.
Received on Fri Sep 07 2012 - 14:10:23 MDT

This archive was generated by hypermail 2.2.0 : Sat Sep 08 2012 - 12:00:09 MDT