RE: [squid-users] storeurl_rewriter and "URL mismatch" log entries

From: Kathleen M Kelly <kmkelly_at_yahoo-inc.com>
Date: Wed, 18 Nov 2009 10:51:17 -0800

This issue I'm seeing, where the original fetch url somehow gets into
the cache even though storeurl_rewriter is running and should be
normalizing it and caching only the normalized url...this seems to be
directly related to me also getting "TCP_SWAPFAIL_MISS" errors in my
access log.

A request comes in, gets a TCP_MISS. I thought that it then caches the
normalized url. But then the next request for this same url gets a
TCP_SWAPFAIL_MISS. After that, it gets TCP_HITs. The TCP_SWAPFAIL_MISS
must be causing it to cache the original url.

Does this added info mean anything to anyone? I'm trying to research
what causes the TCP_SWAPFAIL_MISS now and see if I can find anything.

Thanks so much!

-----Original Message-----
From: Kathleen M Kelly [mailto:kmkelly_at_yahoo-inc.com]
Sent: Tuesday, November 17, 2009 5:14 PM
To: squid-users_at_squid-cache.org
Subject: [squid-users] storeurl_rewriter and "URL mismatch" log entries

Hello,

I have a squid application where a storeurl_rewriter program is needed
to normalize incoming fetch urls into the same url. I am running
squid-2.7_9, and have a storeurl_rewriter program that gets launched at
squid startup. In my config file, I set storeurl_rewrite_children to
100, and only allow http requests to come through (using the
storeurl_access allow proto HTTP setting, so any
cache_object://localhost requests will be ignored).

The problem I am having is I keep getting cache.log entries that look
like "storeClientReadHeader: URL mismatch", comparing the incoming fetch
url with my normalized url. I am getting thousands of these errors an
hour. I was sure to clear out the squid cache as I launched my new
storeurl_rewriter program, so I don't understand how the cache has any
of the original fetch urls in it at all to be causing this URL mismatch.
I was getting warnings about not enough storeurl_rewriter programs
running, which was when I bumped it up to 100. I was also having an
issue where logs showed that when the cache_object://localhost requests
came through, my program shut down, which was why I added the access
setting. I don't see either of these warnings in the logs anymore, yet
still see tons of URL mismatch errors.

Does anyone have any ideas on this?

Just to clarify a bit more...suppose a fetch url looks like
http://www.fetchme.com/123456, but it should then go through my
storeurl_rewriter program where it will be turned into
http://www.normalized.com/123456, and this is what should be used to
fetch and then cache. My assumption is that every fetch request goes
through storeurl_rewriter, so how is it that I would be seeing so many
log entries like "storeClientReadHeader: URL mismatch
{http://www.normalized.com/123456} != {http://www.fetchme.com/123456}?
Is there some case where a fetch request does not go through
storeurl_rewriter? If it was busy, I think I would be seeing a log
warning saying so, which I am not since I bumped children to 100.

Thanks so much,

Kathleen
Received on Wed Nov 18 2009 - 18:52:20 MST

This archive was generated by hypermail 2.2.0 : Thu Nov 19 2009 - 12:00:04 MST