Re: [squid-users] Questions in Squid source code

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 30 Dec 2009 15:34:02 +1300

On Tue, 29 Dec 2009 17:37:42 -0800, Manjusha Maddala
<mmaddala25_at_nextag.com> wrote:
> Hi all,
>
> I'm working with Squid-2.6 and right now stuck on a bunch of questions.
> Would appreciate a word from the Squid experts.

Then you want to be mailing the squid-dev mailing list where the experts
are. This is a place for _users_ to help each other. Emphasis on
configuration file problems and how to setup things.

>
> 1. What is SwapDir?

A cache directory (cache_dir config line). I think.

> Is that the in-memory representation of the disk
> cache? What does the in-memory representation of the disk cache look
> like - does it follow the same format as the swap.state file?

No.

>
> 2. What is StoreEntry?

An HTTP object. The in-memory representation of a disk file: storage
details, meta data, HTTP headers, binary body data.

>
> 3. In squid/src/structs.h,
>
> what do each of the entries in the below structure symbolize?
>
> struct _cacheSwap {
> SwapDir *swapDirs;

An array of cache_dir lines

> int n_allocated;

Maybe the number which exist in squid.conf.

> int n_configured;

Maybe the number which have been completely configured/setup/whatever.

> } cacheSwap;
>
> when/where do they get initialized?

By something in the configuration file parser.
look for a function parse_X() where X is the TYPE: line in src/cf.data.pre
assigned to the cache_dir option.

>
> 4. Each time squid -k rotate is done, I notice a new swap.state file
> gets added along with a 0 byte swap.state.last-clean file. How is the
> new swap.state file built? Is the in-memory hashtable/map dumped into
> this file during rotate or is it built by crawling all the directories
> in the disk cache and fetching the meta data of each file?

Both. The swap.state is a re-formatted journal dump of the in-memory cache
index generated at rotate time.
The in-memory cache index is built from 1) loading a previous swap.state
file (CLEAN load), 2) scanning the disk cache item-by-item (DIRTY load),
and 3) adding/removing entries during live operation.

>
> 5. Once the swap.state file is built, it keeps growing until the next
> periodic squid rotate is kicked off. What are these new entries that get
> appended to swap.state? I'm guessing each time a new webpage gets
> cached,
> 5.1) the in-memory table gets updated with the meta data for the new URI
> 5.2) one entry is made in store.log with a "SWAPOUT" tag
> 5.3) one entry is made in swap.state with the meta data for the new URI
>
> Somewhere in between the two squid rotate jobs, the cache replacement
> thread comes in and evicts the least recently used pages. The memory
> hashtable gets updated accordingly, *but* the swap.state file doesn't.
> Hence, over time swap.state file grows and needs to be synced up with
> the memory table.

swap.state is a _journal_. There is a removal record added to it when
something gets removed. A file meta record when something gets added. both
when something gets changed.

>
> Did I get it right?
>
> 6. Is there any utility to read the swap.state file?

Yes. Lookup the third-party squidpurge tool.

>
> 7. swap.state file is maintained for loading the in-memory hashtable at
> squid startup. When else is this file used?

All the update times you thought of...

>
> 8. A high-level pseudo code for the request processing algorithm as I
> understand:
>
> - Squid receives a GET request for URL
> - Computes a hash for the URL and uses it as a key to pull the record
> from its internal memory representation of the meta-data of all files on
> the disk cache
> - If a matching record is found, the refresh_pattern rules are applied
> to determine if the content is fresh or stale and a TCP_HIT or
> TCP_REFRESH_HIT/TCP_REFRESH_MISS get logged respectively
> - If no record is found, its a TCP_MISS
>
> Have I missed something?
>
>
> Thanks.
>
> CONFIDENTIALITY NOTICE
> =======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s)

NOTE: none of the intended recipients have yet received this email.
Instead it went to a large group of administrators, few of whom can help.

Amos
Received on Wed Dec 30 2009 - 02:34:13 MST

This archive was generated by hypermail 2.2.0 : Wed Dec 30 2009 - 12:00:02 MST