Re: Store_url_rewrite for squid 3+

From: Eliezer Croitoru <eliezer_at_ngtech.co.il>
Date: Fri, 07 Sep 2012 15:20:14 +0300

On 09/06/2012 09:04 PM, Alex Rousskov wrote:
> On 09/05/2012 06:58 PM, Amos Jeffries wrote:
> FWIW, I have not reviewed the store_url_rewrite code in Squid2 so I
> cannot answer the questions related to how it was done.
> I can suggest ways of doing this in Squid3, but since somebody already
> investigated all the alternatives, it would be better to hear the
> summary of the Squid2 implementation (as it relates to Store) before
> diving into Squid3 development.
squid2 form makes sense for me by the minute.
> The biggest question for me is why Squid2 code was storing multiple
> URLs with the cached object (if it was).
> Why cannot Store just work with the [rewritten] URL given to it and
> ignore the fact that some [store] URLs originated from some other
> [real] URLs?
I found the answer in the documentation at: 03_major_componenets:
"The Storage Manager is the glue between client and server
     sides. Every object saved in the cache is allocated a
     StoreEntry structure. While the object is being
     accessed, it also has a MemObject structure."
so I think the duplication was to preserve this structure and prevent
major api changes.
as far I get into the depth of the code I see how it's reasonable to
make this decision if you compare the loss and benefits .
loosing a few single(sometimes it's not) bytes of GB is cheap compared
to depth development time.

some calculations: youtube url will be about then 600 ascii letters and
before it was object = video but now it's about 1.7MB per chunk.
so the loss of about 600 bytes of space(am i right?) compares to 20
Million bytes gain?
well on 1.7MB it's something else but we are talking about 96Kb loss for
a video file.

> Are we trying to support going from a store_url_rewrite config back to
> regular config without losing some of the cached objects?
Since we are talking about a try for solution to solve a static objects
de-duplication I think it's not our case.
and even for a more dynamic one the case is always many->1 to achieve
better cache and it's a case that makes no sense in rolling back to a
dynamic url acquired per unique IP+COOKIE+TIME+other stuff.

It will make sense if there was a plan for rebuilding the cached objects
and the storeurl in store but it's seems like too intensive task to hand
it like a reasonable regular usage case.

If someone have this kind of cases he should migrate from generic
cache-proxy to a more specific task customized proxy.(still not seems
like a reasonable request for anything unless you want to be the
CIA\KGB\FBI and collect data)

<SNIP>

>>> ? question how cachemgr gets the list of urls in memory?
> You might be confusing "cache manager" (the thing that responds to
> "squidclient mgr:info" requests) with Store. Also, you should not think
> in terms of memory (RAM) because some objects are only cached on disk.
> It is best to think of Store as a collection of stored objects, ignoring
> their particular location (memory or disk) to the extent possible.
no no i was talking about mgr:info ..
this not related in straight connection but I had a question about it in
the past and will leave that to somewhere in the future.

but related to to store_url there was something that prevented
mgr:what-ever-gives-data-on-cached-objects that it wont show the
store_url objects.
after understanding how it was coded it's pretty obviates how and why
this happened and it can be prevented during new development while
structuring the data correctly.

> Store can get a list of cached objects by iterating through store_table
> and other store indexes. In general, you should not assume that it is
> possible to get a list of all cached URLs in any efficient/practical
> fashion because not all in-RAM indexes store URLs. It is only possible
> to get an answer to the following question:
>
> * Is a response with cache key K likely to be in Squid cache now?
>
> Where cache key is a hash computed over the request method, request URI,
> and other properties.
this is what I remember.
so for the new development there should be an option to do that but also
do the rewriting of the url before checking that or more practically to
change the cache key calculations if there is a store_url present for
the request.

> <SNIP>
> The Store is too big and complex of an API to accurately describe in an
> email IMO. I would be happy to answer specific questions about the stuff
> I know, but you may have to research how things work as there is no
> comprehensive documentation yet.

just need major points in the process mentioned before. but will take
sometime until I will review more code to not make any basic stand on
the subject.

I do have an approach that I want to check and Its' based on finding
all\specific mem_object operations and object creation in the code.
now I'm struggling to juggle find and mark the points.

answering the double url thing
In 2.7 the feature used the storeswap_meta to add the storeurl string
and it's simplified things.
it's a TLV struct so you are safe in to not harm the original request
or reply and be able to recover data on any part of the code.
I noticed "STORE_META_STOREURL" still in the TLV headers (probably to
support older cache object structure versions) so i will try to use it..
for some testing purposes.

I have tested and added basic tests to make sure that the storeurl is
being written and used and it's not hearts the any current cache objects
or stuff.
any thoughts?

> Another complication is that such fundamental Squid2 Store feature as
> store_table needs to be removed but it has not been completely removed
> from Squid3 yet, so there is some [older] code that relies on it and
> some [newer] code that tries hard to stay away from it, all while doing
> the same kind of operations.
while proposing the feature implementation will consider what and where
to add after the proposal.
> Finally, the whole Store class hierarchy is ugly to a fault. It needs to
> be split into more independent classes instead of everybody and the
> kitchen sink inheriting from Store, hiding the intended boundaries among
> "store manager", "memory storage manager", "disk storage manager",
> "cache_dir manager", etc.
So what do you say? "Beautiful is better than ugly."(PEP 20) ?
> Good luck,
>
> Alex.

Thanks,
Eliezer
Received on Fri Sep 07 2012 - 12:20:26 MDT

This archive was generated by hypermail 2.2.0 : Fri Sep 07 2012 - 12:00:10 MDT