Re: memory-mapped files in Squid

From: Kevin Littlejohn <darius@dont-contact.us>
Date: Tue, 02 Feb 1999 11:30:29 +1100

>>> Henrik Nordstrom wrote

> > Aah, you mean that when some object hits 10/sec then it shouldn't
> > be rewritten to the fifo head at the same rate?
>
> Yes, among other things.
>
> > btw, how did you estimate?
>
> A rought calculation of refreshrates (both TCP_REFRESH and
> TCP_CLIENT_REFRESH). But as I said it is an early estimate and I may
> have missed something important. As Alex said some simulations should be
> done to get a clearer picture of how things behaves.

I think this is going to be a big part of the tuning for the cyclic fs -
how do you deal with, for instance, the next MS service pack - 50Mb of
heavily-hit object. Personally, re-writing objects on hits doesn't give me
warm fuzzy feelings...

> > We need to rewrite squid store interface I guess.
>
> Yes. Some kind of abstract store interface is needed. It is kind of hard
> to tweak the fd-centric design into a non-fd centric system.. This
> applies to all communication and not only disk I/O.
>
> > We don't want to add indirection layer between swap_file_number and
> > physical location on disk.
>
> No, and it has never been the intention to use such an redirection
> layer. That would be to essentially reimplement a kind of directory
> structure and that was one of the things I wanted to get rid of.
>

This stuff applies to any squidFS - the aim should be to have squid store,
internally, a direct pointer to the location on disk for the object, rather
than any level of indirection. I think that's one of the crucial points
- and yeah, we could do with a slight abstraction of the current disk
handling, so there's not read() and write() sprinkled through asyncio and
disk.c, and so what's stored as a 'name' (and what's stored as a 'fd' for
open files) is easily changeable.

Is anyone out there looking into this yet? Because I'm about three days
away from starting on the squid side of the sfs stuff - so if someone is
already eyeing off cleaning up squid's disk IO, I'd like to talk to them ;)

> Writing and reading should be isolated from each other. Of course
> objects should be able to be written back on another disk than it's read
> from. Objects should always be written to the most suitable disk,
> regardless if the object came from network, disk or whatever. A object
> is a object.

Except if you've already incurred the overhead of writing to disk, why
incur it again? I'm still not convinced that increasing the workload of
the disk is a good thing to do in the process of attempting to speed up
disk access. I know that there's many other things affecting disk access
speeds - but that media is still the slow part of the chain (well, after
network), so it makes sense to me to keep disk use low.

>
> > Like sort of transaction log? If we had global URL database, we'd not
> > need this. But this solution might be even better. hotswap the drive,
> > restart squid, and it runs with other subset of URL database...
>
> Yes, I see metadata logs is a sort of transaction log.
>
> In theory Squid can be programmed to hotswap the drive without restart.
> What is needed is the functions "disable and release everything from
> cache_dir X" and "activate cache_dir X".

In fact, squid already handles a drive 'going away'. It doesn't currently
handle a drive 'coming online', but that would be almost trivial - just
get it to rebuild the cache_store index for that drive, and some way of
signalling that the drive has come on-line... So long as all the data
pertaining to a drive is on that drive, and nowhere else, then you've got
a swappable setup.

> > uhh. can we avoid large multipart files on fifo storage?
>
> As you said a hybrid design could be used, where some storage is FIFO
> and some filesystem based.
>
> A third possibility is a spool area for larger objecs, from which
> completed objects are written to the FIFO. This area can also managed
> using FIFO to automatically clean out any fragmentation.

I'd be curious to see what the 'best' lower limit for size of objects is before
you start using this 'staging area'. I'd also be curious to see what impact
it has on performance if the object sizes drift - if you're heavily hitting
that area, you may start to loose some of the cyclic gains :(

>
> There are a couple of other possibilites as well.
>
> You may remember my wording "multilevel" in earlier messages. This came
> from the idea that store could be maintained at multiple levels, where
> the first level blindly writes object and the second (and third ...)
> level eats objects from the tail onto the next level fifo. This is a
> good idea if the first level wastes a lot of space on objects that then
> is thrown away (refreshed or aborted during storage), but it may be hard
> load balance such a system...

Cute idea.

There's definately some nifty ideas there, but I'm not convinced that the
gains from a cyclic fs over a more traditional style fs are enough to warrant
the extra management complexity - shuffling objects around on disk, etc.
Guess the only real way to tell is to implement and see ;)

KevinL
(Who can see much experimentation coming up... time to requisition a new
cache box ;)
Received on Tue Jul 29 2003 - 13:15:56 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:03 MST