Questions in Squid source code

From: Manjusha Maddala <mmaddala25_at_nextag.com>
Date: Wed, 30 Dec 2009 10:29:04 -0800

Hi all,

I'm working with Squid-2.6 and right now stuck on a bunch of questions.
Would appreciate a word from the Squid experts.

1. What is SwapDir? Is that the in-memory representation of the disk
cache? What does the in-memory representation of the disk cache look
like - does it follow the same format as the swap.state file?

2. What is StoreEntry?

3. In squid/src/structs.h,

    what do each of the entries in the below structure symbolize?

    struct _cacheSwap {
        SwapDir *swapDirs;
        int n_allocated;
        int n_configured;
    } cacheSwap;

    when/where do they get initialized?
        
4. Each time squid -k rotate is done, I notice a new swap.state file
gets added along with a 0 byte swap.state.last-clean file. How is the
new swap.state file built? Is the in-memory hashtable/map dumped into
this file during rotate or is it built by crawling all the directories
in the disk cache and fetching the meta data of each file?

5. Once the swap.state file is built, it keeps growing until the next
periodic squid rotate is kicked off. What are these new entries that get
appended to swap.state? I'm guessing each time a new webpage gets
cached,
5.1) the in-memory table gets updated with the meta data for the new URI
5.2) one entry is made in store.log with a "SWAPOUT" tag
5.3) one entry is made in swap.state with the meta data for the new URI

Somewhere in between the two squid rotate jobs, the cache replacement
thread comes in and evicts the least recently used pages. The memory
hashtable gets updated accordingly, *but* the swap.state file doesn't.
Hence, over time swap.state file grows and needs to be synced up with
the memory table.

Did I get it right?

6. swap.state file is maintained for loading the in-memory hashtable at
squid startup. When else is this file used?

7. A high-level pseudo code for the request processing algorithm as I
understand:

        - Squid receives a GET request for URL
        - Computes a hash for the URL and uses it as a key to pull the
record from its internal memory representation of the meta-data of all
files on
the disk cache
        - If a matching record is found, the refresh_pattern rules are
applied to determine if the content is fresh or stale and a TCP_HIT or
TCP_REFRESH_HIT/TCP_REFRESH_MISS get logged respectively
        - If no record is found, its a TCP_MISS

Have I missed something?

8. Are swap.state and in-memory cache index always in sync? If yes, why
does the swap.state file get compacted at each squid rotate interval?

Thanks.

CONFIDENTIALITY NOTICE
=======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Received on Wed Dec 30 2009 - 18:29:17 MST

This archive was generated by hypermail 2.2.0 : Thu Dec 31 2009 - 12:00:03 MST