On 09/06/2012 09:04 PM, Alex Rousskov wrote:
> On 09/05/2012 06:58 PM, Amos Jeffries wrote:
> FWIW, I have not reviewed the store_url_rewrite code in Squid2 so I 
> cannot answer the questions related to how it was done.
> I can suggest ways of doing this in Squid3, but since somebody already 
> investigated all the alternatives, it would be better to hear the 
> summary of the Squid2 implementation (as it relates to Store) before 
> diving into Squid3 development. 
squid2 form makes sense for me by the minute.
> The biggest question for me is why Squid2 code was storing multiple 
> URLs with the cached object (if it was).
> Why cannot Store just work with the [rewritten] URL given to it and 
> ignore the fact that some [store] URLs originated from some other 
> [real] URLs?
I found the answer in the documentation at: 03_major_componenets:
"The Storage Manager is the glue between client and server
     sides.  Every object saved in the cache is allocated a
     StoreEntry structure.  While the object is being
     accessed, it also has a MemObject structure."
so I think the duplication was to preserve this structure and prevent 
major api changes.
as far I get into the depth of the code I see how it's reasonable to 
make this decision if you compare the loss and benefits .
loosing a few single(sometimes it's not) bytes of GB is cheap compared 
to depth development time.
some calculations: youtube url will be about then 600 ascii letters and 
before it was object = video but now it's about 1.7MB per chunk.
so the loss of about 600 bytes of space(am i right?) compares to 20 
Million bytes gain?
well on 1.7MB it's something else but we are talking about 96Kb loss for 
a video file.
> Are we trying to support going from a store_url_rewrite config back to 
> regular config without losing some of the cached objects? 
Since we are talking about a try for solution to solve a static objects 
de-duplication I think it's not our case.
and even for a more dynamic one the case is always many->1 to achieve 
better cache and it's a case that makes no sense in rolling back to a 
dynamic url acquired per unique IP+COOKIE+TIME+other stuff.
It will make sense if there was a plan for rebuilding the cached objects 
and the storeurl in store but it's seems like too intensive task to hand 
it like a reasonable regular usage case.
If someone have this kind of cases he should migrate from generic 
cache-proxy to a more specific task customized proxy.(still not seems 
like a reasonable request for anything unless you want to be the 
CIA\KGB\FBI and collect data)
<SNIP>
>>> ? question how cachemgr gets the list of urls in memory?
> You might be confusing "cache manager" (the thing that responds to
> "squidclient mgr:info" requests) with Store. Also, you should not think
> in terms of memory (RAM) because some objects are only cached on disk.
> It is best to think of Store as a collection of stored objects, ignoring
> their particular location (memory or disk) to the extent possible.
no no i was talking about mgr:info ..
this not related in straight connection but I had a question about it in 
the past and will leave that to somewhere in the future.
but related to to store_url there was something that prevented 
mgr:what-ever-gives-data-on-cached-objects that it wont show the 
store_url objects.
after understanding how it was coded it's pretty obviates how and why 
this happened and it can be prevented during new development while 
structuring the data correctly.
> Store can get a list of cached objects by iterating through store_table
> and other store indexes. In general, you should not assume that it is
> possible to get a list of all cached URLs in any efficient/practical
> fashion because not all in-RAM indexes store URLs. It is only possible
> to get an answer to the following question:
>
>    * Is a response with cache key K likely to be in Squid cache now?
>
> Where cache key is a hash computed over the request method, request URI,
> and other properties.
this is what I remember.
so for the new development there should be an option to do that but also 
do the rewriting of the url before checking that or more practically to 
change the cache key calculations if there is a store_url present for 
the request.
> <SNIP>
> The Store is too big and complex of an API to accurately describe in an
> email IMO. I would be happy to answer specific questions about the stuff
> I know, but you may have to research how things work as there is no
> comprehensive documentation yet.
just need major points in the process mentioned before. but will take 
sometime until I will review more code to not make any basic stand on 
the subject.
I do have an approach that I want to check and Its' based on finding 
all\specific mem_object operations and object creation in the code.
now I'm struggling to juggle find and mark the points.
answering the double url thing
In 2.7 the feature used the storeswap_meta to add the storeurl string 
and it's simplified things.
it's  a TLV struct so you are safe in to not harm the original request 
or reply and be able to recover data on any part of the code.
I noticed "STORE_META_STOREURL" still in the TLV headers (probably to 
support older cache object structure versions) so i will try to use it.. 
for some testing purposes.
I have tested and added basic tests to make sure that the storeurl is 
being written and used and it's not hearts the any current cache objects 
or stuff.
any thoughts?
> Another complication is that such fundamental Squid2 Store feature as
> store_table needs to be removed but it has not been completely removed
> from Squid3 yet, so there is some [older] code that relies on it and
> some [newer] code that tries hard to stay away from it, all while doing
> the same kind of operations.
while proposing the feature implementation will consider what and where 
to add after the proposal.
> Finally, the whole Store class hierarchy is ugly to a fault. It needs to
> be split into more independent classes instead of everybody and the
> kitchen sink inheriting from Store, hiding the intended boundaries among
> "store manager", "memory storage manager", "disk storage manager",
> "cache_dir manager", etc.
So what do you say? "Beautiful is better than ugly."(PEP 20) ?
> Good luck,
>
> Alex.
Thanks,
Eliezer
Received on Fri Sep 07 2012 - 12:20:26 MDT
This archive was generated by hypermail 2.2.0 : Fri Sep 07 2012 - 12:00:10 MDT