An alternative to USE_TRUNCATE_NOT_UNLINK, efficient and without race conditions.

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Mon, 24 May 1999 01:50:03 +0000

Arjan de Vet wrote:

> What my patches address are not strictly bugs but what I would call
> 'annoying behavior'. I'd like to keep the fileset and the directories as
> small as possible without enormous amounts of unused and truncated
> files. Therefore the 'suggestion' variable is reset if possible when it
> has reached the optimal number of files (L1 * L2 * L2).

Optimal is a subjective term. Your patch will cause a lot of bitmap
searching on a close to filled cache, and will only work if the
estimated average object size is less than the actual.

> It looks like storeDirClean never removes or only truncates misplaced
> files when USE_TRUNCATE_NOT_UNLINK has been defined (the default). I
> think files which are in the wrong directory should be unlinked,
> independent of the USE_TRUNCATE_NOT_UNLINK setting, because it
> unnecessarily uses inodes.

defenitely a bug, not a misfeature. I think a simpler approach is to
simply remove the #if USE_TRUNCATE_NOT_UNLINK blocks from storeDirClean.
Your patch seems a bit counter-productive as it will not only remove
malplaced files, but also those empty files Squid has prepared for
itself..

> The problem is that the value of the 'suggestion' variable in
> file_map_allocate increases until max_n_files has been reached; then it
> will be reset to 0. max_n_files has no direct relation to L1 and L2,
> therefore this problem can occur.

This is intentional when using unlink to get the directories reasonably
cleaned before they are reused. And it is not entirely true that
max_n_files is in no direct relation to L1 and L2. In the default config
only the number of L1*L2 directories needed to store max_n_files is
used. Only if you manually tune L1 to a "to small" value it will wrap
and store more files in the first directories.

Also, be warned that there is a race condition when files are reused.
Squid marks a file unused as soon as it decides it is time to throw that
content away, but the unlink/truncate is asyncronous and Squid may in
some cases reuse the file before it gets unlinked/truncated. Your change
will cause this to happen more frequently. When unlink was used this was
not such a big problem (squid happily carried on if it should happen),
but it looks like truncate can make things a lot worse. When unlink was
used it the cache file was either ok or completely gone, but truncate
can result in malformed cache files where the beginning of the file is
gone. (squid recycles a file and reuses it before the truncate is
carried out. truncate then removes the inital contents of the new entry)

Here is something constructive to replace USE_TRUNCATE_NOT_UNLINK with a
similar but very different approach that should give a noticeable
performance improvement on all platforms, better unlink/truncate
patternt and no race condition:

When data is recycled, simply reuse the files as is. Don't bother
unlinking or truncating these unless the store is above the high water
mark. Instead simply overwrite the file, and truncate it on close.

Details on how to do it:

storeMaintainSwapSpace:
Maintain a list of reuse candidates if the store is above the low water
mark. Use some kind of sliding average to calculate how much the list
needs to be refilled each second, or try to keep it at a fixed limit N.

storeSwapOutStart:
Use files from the release candidate list. Only allocate new files if
there is no release candidates.

storeCheckSwapOut:
Release objects from the "to be freed" list if the store gets above the
high water mark. Those objects preferably gets unlinked to get rid if
the excess inodes and directory entries when the average object size
grows..

storeSwapOutFileClose:
Truncate the file to the correct size before closing the file.

storeGet / storeLock:
If the object is on the "to be freed" list, then move it back on the
normal store_list. This can be done with a small change in the
implementation of dlists (use a dummy node for the head and tail of the
list, instead of checks for the known head/tail. code examples below)

storeRelease:
Move the entry to the head of "to be reused" (the end things are reused
from).

/*
 * Implementation of a simple doubly-linked list with unnamed
delete/insert
 * operations (only add is named)
 */

struct list_entry {
  struct list_entry *next;
  struct list_entry *prev;
};

struct list {
  /* this can on most platforms/compilers be optimized to 3 pointers,
   * but this way it is more portable
   */
  struct list_entry internal_list_head;
  struct list_entry internal_list_tail;
#define list_head internal_list_head.next
#define list_tail internal_list_tail.prev
};

void init_list (struct list *l) {
  l->internal_list_head.next = &l->tail;
  l->internal_list_head.prev = NULL;
  l->internal_list_head.next = NULL;
  l->internal_list_tail.prev = &l->head;
}

void list_delete(struct list_entry *e) {
  e->next->prev = e->prev;
  e->prev->next = e->next;
  e->prev = NULL;
  e->next = NULL;
}

void list_add_head(struct list_entry *e, struct list *l) {
  e->prev = NULL;
  e->next = l->list_head;
  l->list_head = e;
}

void list_add_tail(struct list_entry *e, struct list *l) {
  e->next = NULL;
  e->prev = l->list_tail;
  l->list_tail = e;
}

void list_insert_after(struct list_entry *e, struct list_entry *ref) {
  e->next = ref->next;
  ref->next = e;
  e->prev = ref;
  e->next->prev = e;
}

void list_insert_before(struct list_entry *e, struct list_entry *ref) {
  e->prev = ref->pref;
  ref->pref = e;
  e->next = ref;
  e->prev->next = e;
}

/* Iterate throught all members head to tail */
  struct list a_list;
  struct list_entry e;
  for (e = a_list.list_head; e->next != NULL; e=e->next) {
    /* process list_entry e */
  }

/* Iterate throught all members tail to head */
  struct list a_list;
  struct list_entry e;
  for (e = a_list.list_tail; e->prev != NULL; e=e->prev) {
    /* process list_entry e */
  }

And some simple macros to transform addresses to embedded list_entry
structures to the containing structure: (this would allow us to save 8
bytes on each dlist entry == each cache object and ip/fqdn cache entry)

#define MEMBER_TO_BASE(type, member, ptr) \
   ((type *)(((char *)ptr)-(((char *)&(((type *)NULL)->member)-((char
*)NULL))))

( pointer - offsetof base type member, with typecast to byte pointers
for the aritmetic )

example:

struct something {
   int i2,i2;
   stuct list_entry lent;
   char *s;
};

  struct list_entry *le;
  struct something *e
  le = list.list_head;
  e = MEMBER_TO_BASE(struct something, lent, le);
Received on Tue Jul 29 2003 - 13:15:58 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:08 MST