some notes and help to think is needed + Test results of testing basics in store_url_rewrite.

From: Eliezer Croitoru <eliezer_at_ngtech.co.il>
Date: Thu, 27 Sep 2012 06:09:48 +0200

Well I was reading here and there the store code and other things then
started testing some theories about how store_url can be and should be
implemented.

If you do remember or not I refactored the source in couple places to
use originalUrl and original_url to check all the places where original
url is used and then decide where the store_url should be placed if needed.
(memobject->original_url and has memobject->store_url)

the main place of all I have seen that should be used with the store_url
is the
http://bazaar.launchpad.net/~squid/squid/3-trunk/view/head:/src/store_key_md5.cc#L144

which gets requests from couple places and mainly the setPublic for
store entry
(Does bazaar has search option in it ?)

so I changed the code to use the store_url in a case it exists in the
memobject instead of the "urlCanonical(request)".
The result is good since it makes the predicted md5 hash for the url
based on the store_url and not the original_url.

At this point I started looking up for more places that original_url
might should be replaced since maybe there is a place I missed like when
storing or fetching the file for any reason there is that I cant see and
think of.

I placed a test file on my server called v.flv which is a static text file.
I used a store_url helper that will use the arguments id itag and range
(something I know and have script for) to rewrite the url.
from
'http://www1.ngtech.co.il/v.flv?id=1000&itag=34&range=0-100&icu=0&tag1'
into 'http://ngtech.squid.internal/id=1000&itag=34&range=0-100'

which I later will change the id itag and range on the request and check
what happens in my extended logs.

so as expected:

2012/09/27 04:34:17 kid1| A candidate recived from Store URL rewriter
accepted: http://ngtech.squid.internal/id=1000&itag=34&range=0-100
2012/09/27 04:34:17.157 kid1| storeKeypublicreuest could be with URL and
method: GET http://ngtech.squid.internal/id=1000&itag=34&range=0-100
2012/09/27 04:34:17.157 kid1| storeKeypublicreuest with URL and method:
GET http://ngtech.squid.internal/id=1000&itag=34&range=0-100
2012/09/27 04:34:17.157 kid1| storeKeypublicreuest was created using
method and url from request
2012/09/27 04:34:17.157 kid1| storeGet: looking up
4F2F79AE5171037CBE02EA95ECD5B86C

the ^^^ hexdigest is what expected to be looked for.
instead of looking up for: 7F77FC394B370386D2E6AD9F8B748846
its looking for: 4F2F79AE5171037CBE02EA95ECD5B86C

which is a very good progress and the file was cached based on the
store_url hash.

I started seeing somethings with large files so I reduced the file size
to 702 bytes and then I will be able to look over things without too
much stuff running because of the size.

a new session new file and other url which result a good 304 validation
and the web server logs shows 200 for the first and 304 for the second.
##access.log
1348715219.641 205 127.0.0.1 TCP_MISS/200 1088 GET
http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1 -
HIER_DIRECT/79.181.248.214 video/x-flv
1348715278.783 11 127.0.0.1 TCP_REFRESH_UNMODIFIED/200 1096 GET
http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1 -
HIER_DIRECT/79.181.248.214 video/x-flv

(you can skip this log or jump over what you know)
##cache.log
A candidate recived from Store URL rewriter accepted:
http://ngtech.squid.internal/id=100&itag=34
storeKeypublicreuest could be with URL and method: GET
http://ngtech.squid.internal/id=100&itag=34
storeKeypublicreuest with URL and method: GET
http://ngtech.squid.internal/id=100&itag=34
storeKeypublicreuest was created using method and url from request
storeGet: looking up A5E995AF63D94AC8E7B40ACA255948F1
src/store_dir.cc(748) get: got in-transit entry: -1_at_-1=1/2/0/0
StoreEntry::lock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=1
storeCreateEntry: 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1'
src/store.cc(395) StoreEntry: new StoreEntry 0x247c080
src/MemObject.cc(109) MemObject: new MemObject 0x2788050
storeKeyPrivate: GET http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
storeKeyPrivate was created using method and url is:
StoreEntry::hashInsert: Inserting Entry 0x247c080 key
'56E7D8D797D8F666B16138EF1686E5B7'
StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=2
StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=3
src/store.cc(1677) originalUrl: A storeEntry::origianlUrl being used
here using URL:http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
src/store.cc(1678) originalUrl: A storeEntry::origianlUrl Could be used
with URL:http://ngtech.squid.internal/id=100&itag=34
StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=4
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=3
src/store.cc(1677) originalUrl: A storeEntry::origianlUrl being used
here using URL:http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
src/store.cc(1678) originalUrl: A storeEntry::origianlUrl Could be used
with URL:http://ngtech.squid.internal/id=100&itag=34
src/store.cc(1677) originalUrl: A storeEntry::origianlUrl being used
here using URL:http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
src/store.cc(1678) originalUrl: A storeEntry::origianlUrl Could be used
with URL:http://ngtech.squid.internal/id=100&itag=34
StoreEntry::replaceHttpReply:
http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1
ctx: enter level 0: 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1'
StoreEntry::expireNow: '56E7D8D797D8F666B16138EF1686E5B7'
StoreEntry::setReleaseFlag: '56E7D8D797D8F666B16138EF1686E5B7'
ctx: exit level 0
StoreEntry::lock: key '56E7D8D797D8F666B16138EF1686E5B7' count=4
storeSwapOutFileClose: 56E7D8D797D8F666B16138EF1686E5B7 how=1
storeSwapOutFileClose: sio = 0
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=3
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=2
StoreEntry::lock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=2
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=1
StoreEntry::unlock: key '56E7D8D797D8F666B16138EF1686E5B7' count=0
storeRelease: Releasing: '56E7D8D797D8F666B16138EF1686E5B7'
src/store.cc(460) destroyStoreEntry: destroyStoreEntry: destroying 0x247c088
src/store.cc(438) destroyMemObject: destroyMemObject 0x2788050
src/MemObject.cc(132) ~MemObject: del MemObject 0x2788050
StoreEntry::unlock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=1
StoreEntry::unlock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=0

now comes the weird part:
I have tried using another url that should result in the same rewritten
url: http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11
which will result in: http://ngtech.squid.internal/id=100&itag=34

But while in the access.log it shows:
1348716354.472 115 127.0.0.1 TCP_HIT/200 1096 GET
http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11 -
HIER_DIRECT/79.181.248.214 video/x-flv

the actual server log shows:
IP - - [27/Sep/2012:05:24:38 +0200] "GET /v.flv?id=100&itag=34&ii=11
HTTP/1.1" 200 702

So I looked at the cache.log to understand what happens that it shows
HIT in one place but does another thing.

the full log is here: http://www1.ngtech.co.il/store_url/test3.txt
but I noticed one thing:

storeGet: looking up A5E995AF63D94AC8E7B40ACA255948F1
src/store_dir.cc(748) get: got in-transit entry: -1_at_-1=1/2/0/0
StoreEntry::lock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=1
clientProcessHit: URL mismatch,
'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=1' !=
'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11'
StoreEntry::unlock: key 'A5E995AF63D94AC8E7B40ACA255948F1' count=0
storeCreateEntry: 'http://www1.ngtech.co.il/v.flv?id=100&itag=34&ii=11'

So there is a case in squid which the log shows HIT but the request is
being fetched from source since there is a mismatch between the url in
the stored entry (regardless the hash match) and the actual request.

related code at:
http://bazaar.launchpad.net/~squid/squid/3-trunk/view/head:/src/client_side_reply.cc#L504

This is good for regular requests but not good for rewritten store_url
since it's nature.
The options are:
1. store in meta data the store_url and using it while checking for a
HIT if exists.
2. do not test for url mismatch in a case of store_url.

The above is the only blockage I until now saw in my testings.

I want to dig into the code more and see the exact reason and mechanism
of this clientProcessHit and the surrounding related code (that actually
fetch the file from the server).

I do know that the Client* code is using the request url since it's
fetching the original request but I want to prevent it from doing it and
since there is a store_url to ignore any mismatches there is to the url.

So thoughts and opinions are more then welcome.

If you do have question just ask me and I will clear out whatever needed
and notice that the changes to the original code are very very small and
can be understood in less then minutes.

I Also noticed something disturbing that is not related to any of
store_url code but more to debug_options(maybe one of my debug additions
but it is not happening for almost anything else)
when and only when i'm fetching crossdomain.xml from most of the sites I
have tried like youtube domains while debug_options are at ALL,1 20,3
I'm getting some segmentation problem and squid dies and restarts.

It's something that I have never saw until couple weeks ago.
the cache.log is here: http://www1.ngtech.co.il/store_url/segmentation.txt

Eliezer
Received on Thu Sep 27 2012 - 04:10:03 MDT

This archive was generated by hypermail 2.2.0 : Thu Sep 27 2012 - 12:00:18 MDT