RE: [squid-users] RE: Forcing squid to cache files

From: Volker-Yoblick, Adam <avolker_at_ea.com>
Date: Fri, 10 Dec 2010 06:59:54 -0800

Another related question:

I notice that the lastmod and expires values for every line in my store.log is -1. Is squid unable to cache files without lastmod and expires headers?

-----Original Message-----
From: Volker-Yoblick, Adam [mailto:avolker_at_ea.com]
Sent: Thursday, December 09, 2010 9:59 PM
To: 'Amos Jeffries'; 'squid-users_at_squid-cache.org'
Subject: RE: [squid-users] RE: Forcing squid to cache files

No, none of the files are > 2GB. I know that's a limitation, that's why I mentioned it. =)

Anyone else know why the cache might not be populating correctly?

-----Original Message-----
From: Amos Jeffries [mailto:squid3_at_treenet.co.nz]
Sent: Thursday, December 09, 2010 9:39 PM
To: squid-users_at_squid-cache.org
Subject: Re: [squid-users] RE: Forcing squid to cache files

> -----Original Message-----
> From: Volker-Yoblick, Adam
>
> I have another related question:
>
> I can see my cache filling up, but I'm sending about 7 gigs through the proxy, and the cache doesn't even have 300 MB in it yet, and the transfer is at 62%.
>
> Looking in the store.log, I see a mix of RELEASE and SWAPOUT lines. Also, none of the files are> 2 GB.

If you are trying to store >2GB individual files, Squid has an accounting bug which screws up the size measures for the cache.
see http://bugs.squid-cache.org/show_bug.cgi?id=3068

This has been fixed for 3.1.10, and the bug fix "snapshot" bundles of
3.1 already contain it.

>
> -----Original Message-----
> From: Volker-Yoblick, Adam
>
> Doh! I feel like a moron.
>
> Read up on the refresh_pattern command, and it seems that first 0 on the last line was causing everything to be marked as "not fresh" right away.
>
> I upped that value, and my cache is now filling up.
>
> Nothing to see here.... =)
>
> -----Original Message-----
> From: Volker-Yoblick, Adam
>
> Greetings squid users,
>
> I recently installed squid 3.1.9 on an RHEL 5 server, with no options when running ./configure.
>
> We have a proprietary tool that sends files from one machine to another over HTTP, and I wanted to have squid always cache the files to help improve transfer times when the tool is used from outside the building. Note that this cache will NEVER be used to serve webpages, so I don't care about violating HTTP protocol.
>

HTTP protocol is not about web pages. HTTP protocol is about reliable transfer and delivery of up-to-date and valid objects.

What you have done with the override-* and ignore-* is tell Squid that the objects at a each URL never change and not to believe any spftware which states otherwise. When the web server or app producing them may be stating explicitly that they will at a certain timestamp or already have changed.

Luckily they only apply to the squid they are set in so other caches doing similar bandwidth reduction outside yours will not be crippled.

What you need to do is check the accuracy of the headers being produced by the app and bug its developers to fix any problems you find. For both the server app and the client agent app.

Note that 3.1 has almost full HTTP/1.1 support when talking to servers but only HTTP/1.0 features are reliable when talking to the client/visitors.

> I was able to set up the acls to allow a source connection from my machine, and to allow a destination connection to another machine. I can tell this is working, because when I start the transfer, I see lots of HTTP GET lines in my access.log.
>
> I also see lots of lines in my store.log, but unfortunately, all the lines are RELEASE lines, meaning nothing is being stored in the cache. I verified this by running du -hs on my cache dir, and the size is never going up.
>
> I've spent most of the day googling this issue (and looking at the squid FAQ), and it seems most users have the problem where they are not ignoring "no-cache" commands in the http headers. I tried to get around this in my squid.conf, shown below:
>

Sadly Google is mostly filled with vocal people using 2.5 or jumping to conclusions. Or like yourself with a very narrow focus of HTTP and a specific problem app.

> refresh_pattern ^ftp: 1440 20% 10080 override-expire override-lastmod ignore-reload ignore-no-cache ignore-no-store ignore-must-revalidate ignore-private
> refresh_pattern ^gopher: 1440 0% 1440 override-expire override-lastmod ignore-reload ignore-no-cache ignore-no-store ignore-must-revalidate ignore-private
> refresh_pattern -i (/cgi-bin/|\?) 0 0% 0 override-expire override-lastmod ignore-reload ignore-no-cache ignore-no-store ignore-must-revalidate ignore-private
> refresh_pattern . 0 20% 4320 override-expire override-lastmod ignore-reload ignore-no-cache ignore-no-store ignore-must-revalidate ignore-private
>
> This doesn't seem to fix the problem, however.
>
> I also made sure that the "squid" user is the owner of my cache_dir, and I made sure that cache_effect_user is set to "squid". Running squid -z returns no errors.
>
> I'm kinda stumped at this point.
>
> Anyone have any suggestions? Maybe a "gotcha" that I missed, or proper steps to debug this further?

The big gotcha you have found already is that by ignoring the CC headers that "0" comes into effect immediately.

The HTTP headers may permit much longer caching times than the minimum minutes if you only override the CC headers specifically which are problems.
  "ignore-private" is particularly dangerous to use. It is sent when the content is destined for exactly one visitor and contains details only they are safe to show. Think banking details or -secret government files level of security. Overriding this will send such files to *all* visitors.
   "no-store" is similar with a little bit less danger when the data leaks. Things using this might be shared by simultaneous visitors, but not saved long-term.
  "no-cache" requests a brand new copy even if one is already stored.
For your use case this can be overridden.

Or better yet prevent the app Server from sending any of the above unless it absolutely has to.

  The other overrides are less dangerous as they only bias the algorithm used to calculate storage duration. Your use case can use whichever you are sure the web app server is producing wrong. Discuss this with the app developers to gain that surety or learn why they might be bad ideas.
Hopefully the outcome of that discussion will be a better web app with caching for everybody.

If/when you can ignore the client cache-control requests set this up as a reverse-proxy and use the "accel ignore-cc" option for http_port.
This ignores all the client cache-controls like no-cache and reload etc in a safe way while retaining the trustable server ones.

Amos

--
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.9
   Beta testers wanted for 3.2.0.3
Received on Fri Dec 10 2010 - 15:00:04 MST

This archive was generated by hypermail 2.2.0 : Fri Dec 10 2010 - 12:00:01 MST