Store_url_rewrite 2.7 vs\to 3.head basic review

From: Eliezer Croitoru <eliezer_at_ngtech.co.il>
Date: Sat, 08 Sep 2012 15:24:11 +0300

(I will respond to the other emails also)
A moment before I will get back to the 3.head I want to review what was
done on 2.7 vs 3.head before even touching the code.

summary:

    squid 2.7 store_url_rewrite makes use of
    mem_object->request->storeurl mainly as a null to verify that there
    wasn't any action.
    in all the cases of a test that storeurl is null another action was
    the choice.
    this avoids the need in any step to even access the original request
    url if not needed(avoiding harming the original request url) which
    leads to put code only on key points that needs access to the storeurl.

    these points in squid 3.head is at

    "httpRequest"

        adding char * storeurl related code such as initializing
        safe_free and setStoreurl\storeurl

    "store"

        adding setstoreurl
        "storeKeyPublicByRequest*" hash caclulation by storeurl

    some flags on the way to for the rewriter wrapper such as
    storeurl_done etc..

    history bugs:

        - about the schema structure using "squid://" which is a bad
        idea to use since it's a http request.
        the basic usage should be to use an internal non public domain
        name for storeurl.

        - 302 loops. problem when a url is being rewritten and the same
        url rewrite result in the same url based on the credentials.
        case: http://example.com/?arg1=111&arg2=222&arg3=333
        s/(arg1\=[\d]+).*(arg2\=[\d]+)/http...$1&$2/g
        then the same url with another arg4=444 will result in the same
        rewritten url but the arg4 will result in a 302 reply and will
        be stored by the refresh patterns.
        this can result the the client requesting the same url in a loop.
        there was a patch for that but it refers to all 302 replies and
        is not applied to the current 3 stable.(kind of tested)
        this raises the problem that a conflict between the cache_deny
        allow status_code_acl and refresh pattern wont be applied on the
        current version.

    - - -
    features that was requested before for the feature:
    internally regex store_url /url_rewrite engine. (there was some
    patch about it here:
    http://www.squid-cache.org/Versions/v2/2.HEAD/changesets/11979.patch)
    which will work for most cases with maybe some small exceptions.

    - - -

    I hope the above review was in a form that will be understood to all.
    what you have in mind just throw it in the air.

    the development can go in two ways:
    1. using squid internal regex
    2. using external helper.

    I'm up for option 2 since it will use the exists more stable
    redirect code and will be allowed later if wanted to upgrade the
    code to support option 1.

- - -
in more words and code:

    1: public store key

        the md5hash calculation method: storeKeyPublicByRequestMethod
        was changed to apply hash for the request.
        if storeurl then by storeurl else by url.

        the way the hash is calculated seems like gone a significant
        change of logic since then 2.7
        most of squid else then icp and store dont use
        "storeKeyPublic(const char *url, const HttpRequestMethod&
        method)" but use

        storeKeyPublicByRequest(HttpRequest * request)
        {
             return storeKeyPublicByRequeclistMethod(request,
        request->method);
        }

        ICP is using the method and url for icp requests from other
        peers then it shouldn't bother store_url.

        StoreEntry::setPublicKey in store.cc actully actually is using
        the direct "storeKeyPublic" method only around the vary code so
        I need help from someone who has some background with this
        specific piece of code to decide the better approach about it.
        it uses the "mem_obj->url" so changing anything should be simple.

    2. meta data

        was used on store_swapmeta and on store_client
        on swapmeta to encapsulate it in meta.
        on store_client on the process of evaluating the fetched object
        from the cache.
        - There was a bug related to meta data 2248
        <http://bugs.squid-cache.org/show_bug.cgi?id=2248> that what I
        think the plan is to avoid using it.

        in 3.head the method to do it is to make sure each of the next
        are consistent.(no store_url)
             STORE_META_KEY:
             STORE_META_URL:
             STORE_META_VARY_HEADERS:
             STORE_META_STD:
             STORE_META_STD_LFS:
             STORE_META_OBJSIZE:

    3. request struct

        add store_url to store the rewritten url
        being checked in the publickey creation.

    4. store

        setstoreurl method for the store_entry->mem_obj->store_url
        safefree(store_url) in destroy memory object process.

        - the next are working somehow very different in 3.head.
        AddVaryState struct contains store_url (taken from
        mem_obj->store_url)
        safefree(store_url) in free_AddVaryState

        in storeAddVary adding a header of "x-squid-internal/vary" with
        time.

        from all the above section it seems to me like in 3.head it far
        more simple to test since the usage of mem_obj->url simplified
        things.
        (in 3.head)I followed the calls back to storeCreateEntry

        storeCreateEntry(const char *url, const char *log_url,
        request_flags flags, const HttpRequestMethod& method)
        {
             ...
             e = new StoreEntry(url, log_url);
             ...
             if (neighbors_do_private_keys || !flags.hierarchical)
                 e->setPrivateKey();
             else
                 e->setPublicKey();
             ...
        }

        it's being called from client_side_request.cc at the docallouts at:
        {...
          if (calloutContext->error) {
                 const char *uri = urlCanonical(request);
                 StoreEntry *e= storeCreateEntry(uri, uri,
        request->flags, request->method);
        #if USE_SSL..
             }
        ...}

        and I dont really know how to even look at this case.
        for me it seems like this an error case which in this case no
        need to lookout for\from.

    5. stat

        was used to print the lookup url as the mem_obj->store_url as
        addition to the mem_obj->url
        on squid 3.head only the key is being used for that.

    6. protos

        declaring the storeEntrysetStoreUrl method and storeAddVary
        which I understood dosnt exists.

    7. httprequest

        destroying the store_url in request

    8. client_side

        clientCreateStoreEntry which is after the store_url is written
        to the request->store_url.
        and other methods doing things based on the vary and etag which
        is handled in other way in 3.head.
        all the rest was in the cache_hit handling and debugging and
        since 3.head is using the key to find a hit i'm almost sure this
        is not needed.

    9. last: client_side_storeurl_rewrite

        was implemented actually in clientStoreURLRewriteDone which just
        set the (request->store_url).

- - -

hope this will help the project move forward towards the goal.(helped me
a lot)

Eliezer
Received on Sat Sep 08 2012 - 12:24:23 MDT

This archive was generated by hypermail 2.2.0 : Sat Sep 08 2012 - 12:00:09 MDT