Re: some notes and help to think is needed + Test results of testing basics in store_url_rewrite.

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 27 Sep 2012 17:55:36 +1200

On 27/09/2012 4:09 p.m., Eliezer Croitoru wrote:
> Well I was reading here and there the store code and other things then
> started testing some theories about how store_url can be and should be
> implemented.
>
> If you do remember or not I refactored the source in couple places to
> use originalUrl and original_url to check all the places where
> original url is used and then decide where the store_url should be
> placed if needed.
> (memobject->original_url and has memobject->store_url)
>
> the main place of all I have seen that should be used with the
> store_url is the
> http://bazaar.launchpad.net/~squid/squid/3-trunk/view/head:/src/store_key_md5.cc#L144
>
> which gets requests from couple places and mainly the setPublic for
> store entry
> (Does bazaar has search option in it ?)

Doxygen provides all the references/referenced-by information.
http://www.squid-cache.org/Doc/code/store__key__md5_8cc.html#a5461ad8f7a969c8bec3008a70a74c42a

>
> so I changed the code to use the store_url in a case it exists in the
> memobject instead of the "urlCanonical(request)".
> The result is good since it makes the predicted md5 hash for the url
> based on the store_url and not the original_url.
>
> At this point I started looking up for more places that original_url
> might should be replaced since maybe there is a place I missed like
> when storing or fetching the file for any reason there is that I cant
> see and think of.
>
> I placed a test file on my server called v.flv which is a static text
> file.
> I used a store_url helper that will use the arguments id itag and
> range (something I know and have script for) to rewrite the url.
> from
> 'http://www1.ngtech.co.il/v.flv?id=1000&itag=34&range=0-100&icu=0&tag1' into
> 'http://ngtech.squid.internal/id=1000&itag=34&range=0-100'
>
> which I later will change the id itag and range on the request and
> check what happens in my extended logs.
>
> so as expected:
>
> 2012/09/27 04:34:17 kid1| A candidate recived from Store URL rewriter
> accepted: http://ngtech.squid.internal/id=1000&itag=34&range=0-100
> 2012/09/27 04:34:17.157 kid1| storeKeypublicreuest could be with URL
> and method: GET http://ngtech.squid.internal/id=1000&itag=34&range=0-100
> 2012/09/27 04:34:17.157 kid1| storeKeypublicreuest with URL and
> method: GET http://ngtech.squid.internal/id=1000&itag=34&range=0-100
> 2012/09/27 04:34:17.157 kid1| storeKeypublicreuest was created using
> method and url from request
> 2012/09/27 04:34:17.157 kid1| storeGet: looking up
> 4F2F79AE5171037CBE02EA95ECD5B86C
>
> the ^^^ hexdigest is what expected to be looked for.
> instead of looking up for: 7F77FC394B370386D2E6AD9F8B748846
> its looking for: 4F2F79AE5171037CBE02EA95ECD5B86C
>
> which is a very good progress and the file was cached based on the
> store_url hash.
>
> I started seeing somethings with large files so I reduced the file
> size to 702 bytes and then I will be able to look over things without
> too much stuff running because of the size.
>
> a new session new file and other url which result a good 304 validation
> and the web server logs shows 200 for the first and 304 for the second.
> ##access.log
> 1348715219.641 205 127.0.0.1 TCP_MISS/200 1088 GET
> http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1 -
> HIER_DIRECT/79.181.248.214 video/x-flv
> 1348715278.783 11 127.0.0.1 TCP_REFRESH_UNMODIFIED/200 1096 GET
> http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1 -
> HIER_DIRECT/79.181.248.214 video/x-flv
>
> (you can skip this log or jump over what you know)
> ##cache.log
> A candidate recived from Store URL rewriter accepted:
> http://ngtech.squid.internal/id=100&itag=34
> storeKeypublicreuest could be with URL and method: GET
> http://ngtech.squid.internal/id=100&itag=34
> storeKeypublicreuest with URL and method: GET
> http://ngtech.squid.internal/id=100&itag=34
> storeKeypublicreuest was created using method and url from request
> storeGet: looking up A5E995AF63D94AC8E7B40ACA255948F1
> src/store_dir.cc(748) get: got in-transit entry: -1_at_-1=1/2/0/0
> StoreEntry::lock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=1
> storeCreateEntry: 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1'
> src/store.cc(395) StoreEntry: new StoreEntry 0x247c080
> src/MemObject.cc(109) MemObject: new MemObject 0x2788050
> storeKeyPrivate: GET http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
> storeKeyPrivate was created using method and url is:
> StoreEntry::hashInsert: Inserting Entry 0x247c080 key
> '56E7D8D797D8F666B16138EF1686E5B7'
> StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=2
> StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=3
> src/store.cc(1677) originalUrl: A storeEntry::origianlUrl being used
> here using URL:http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
> src/store.cc(1678) originalUrl: A storeEntry::origianlUrl Could be
> used with URL:http://ngtech.squid.internal/id=100&itag=34
> StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=4
> StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=3
> src/store.cc(1677) originalUrl: A storeEntry::origianlUrl being used
> here using URL:http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
> src/store.cc(1678) originalUrl: A storeEntry::origianlUrl Could be
> used with URL:http://ngtech.squid.internal/id=100&itag=34
> src/store.cc(1677) originalUrl: A storeEntry::origianlUrl being used
> here using URL:http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
> src/store.cc(1678) originalUrl: A storeEntry::origianlUrl Could be
> used with URL:http://ngtech.squid.internal/id=100&itag=34
> StoreEntry::replaceHttpReply:
> http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
> ctx: enter level 0: 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1'
> StoreEntry::expireNow: '56E7D8D797D8F666B16138EF1686E5B7'
> StoreEntry::setReleaseFlag: '56E7D8D797D8F666B16138EF1686E5B7'
> ctx: exit level 0
> StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=4
> storeSwapOutFileClose: 56E7D8D797D8F666B16138EF1686E5B7 how=1
> storeSwapOutFileClose: sio = 0
> StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=3
> StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=2
> StoreEntry::lock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=2
> StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=1
> StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=0
> storeRelease: Releasing: '56E7D8D797D8F666B16138EF1686E5B7'
> src/store.cc(460) destroyStoreEntry: destroyStoreEntry: destroying
> 0x247c088
> src/store.cc(438) destroyMemObject: destroyMemObject 0x2788050
> src/MemObject.cc(132) ~MemObject: del MemObject 0x2788050
> StoreEntry::unlock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=1
> StoreEntry::unlock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=0
>
>
> now comes the weird part:
> I have tried using another url that should result in the same
> rewritten url: http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11
> which will result in: http://ngtech.squid.internal/id=100&itag=34
>
> But while in the access.log it shows:
> 1348716354.472 115 127.0.0.1 TCP_HIT/200 1096 GET
> http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11 -
> HIER_DIRECT/79.181.248.214 video/x-flv
>
> the actual server log shows:
> IP - - [27/Sep/2012:05:24:38 +0200] "GET /v.flv?id=100&itag=34&ii=11
> HTTP/1.1" 200 702
>
> So I looked at the cache.log to understand what happens that it shows
> HIT in one place but does another thing.
>
> the full log is here: http://www1.ngtech.co.il/store_url/test3.txt
> but I noticed one thing:
>
> storeGet: looking up A5E995AF63D94AC8E7B40ACA255948F1
> src/store_dir.cc(748) get: got in-transit entry: -1_at_-1=1/2/0/0
> StoreEntry::lock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=1
> clientProcessHit: URL mismatch,
> 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1' !=
> 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11'
> StoreEntry::unlock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=0
> storeCreateEntry: 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11'
>
> So there is a case in squid which the log shows HIT but the request is
> being fetched from source since there is a mismatch between the url in
> the stored entry (regardless the hash match) and the actual request.
>
> related code at:
> http://bazaar.launchpad.net/~squid/squid/3-trunk/view/head:/src/client_side_reply.cc#L504
>
>
>
> This is good for regular requests but not good for rewritten store_url
> since it's nature.
> The options are:
> 1. store in meta data the store_url and using it while checking for a
> HIT if exists.
> 2. do not test for url mismatch in a case of store_url.
>
> The above is the only blockage I until now saw in my testings.
>
> I want to dig into the code more and see the exact reason and
> mechanism of this clientProcessHit and the surrounding related code
> (that actually fetch the file from the server).

This is the HTTP/1.1 variant handling code. Each request URL may have
multiple response variants depending on the things indicated by Vary:
header and possibly ETag header as a unique ID for each variant. How it
is done in Squid is that the store contains a vary meta object as well
as the normal file meta object.

The storeURL should point at the meta object, and then when loaded meta
object points at other bits to combine in addition to the storeURL to
make a new key for a second lookup which is expected to find the actual
file object for that variant.

You should make these secondary keys use the storeURL instead of the
original URL (only when the original URL was used, some variant may be
looked up based on ETag or digest values alone).

Some other places which need it are:
   PURGE requets handling lookups
   ICP request lookups
   HTCP request lookups

   Cache Digest is a little more tricky. Ignore it for now. The remote
proxy will never be guaranteed to do the right lookups even if you add
both original and store URL to the digest.

> I Also noticed something disturbing that is not related to any of
> store_url code but more to debug_options(maybe one of my debug
> additions but it is not happening for almost anything else)
> when and only when i'm fetching crossdomain.xml from most of the sites
> I have tried like youtube domains while debug_options are at ALL,1 20,3
> I'm getting some segmentation problem and squid dies and restarts.
>

Hmm. Worth checking separately and fixing outside the storeURL project
if you can identify what it is.

> It's something that I have never saw until couple weeks ago.
> the cache.log is here:
> http://www1.ngtech.co.il/store_url/segmentation.txt
>
>
> Eliezer
Received on Thu Sep 27 2012 - 05:55:47 MDT

This archive was generated by hypermail 2.2.0 : Thu Sep 27 2012 - 12:00:18 MDT