Re: [squid-users] file system type/params optimal for squid?

From: Linda W. <squid-user@dont-contact.us>
Date: Sat, 25 Oct 2003 23:41:09 -0700

Henrik Nordstrom wrote:

>On Sat, 25 Oct 2003, Linda W. wrote:
>
>
>
>>Was about to move my squid directory off onto it's own partition and was
>>wondering what filesystem to use, in the sense of is there a linux (x86)
>>filesystem that performs best for squid? Any special params for block
>>size? It's just a single SCSI disk.
>>
>>
>
>The general consensus is that reiserfs mounted with noatime is currently
>the best performer for a Linux Squid cache. This conclusion arrived after
>countless benchmarks in different configurations, mostly thanks to Joe
>Cooper.
>
>

---
    I'm slightly confused -- do you mean reiserfs is best out of the 
journalled
fs's, or best including non-journaled async (ext2? fat32?) fs's.
>But you can always set up your own benchmarks to see what runs best on 
>your hardware. For benchmarking I highly recommends the polygraph 
>benchmark program with polymix-4 workload.
>
>Only problem with benchmarking is that you need at least two extra
>computers to run the benchmark (one acting as client, one acting as
>server), and that it takes some tries before one is used to how to run the
>benchmarks..
>  
>
---
    Doing benchmarks right is fairly difficult.  So many variables.  So 
many parameters
can affect things.  Like just choice of fs's default allocation unit.  
If a format prog has
defaults of a 512-byte allocation block, it might make a big difference 
in a test where
another sets up for 16Kb blocks.  Defaults could explain a difference in 
performance
if most read/writes are >512 bytes and <16Kb.
    Do you know off hand what Reiserfs's default alloc size is?
> > I'm guessing but a journaling fs might slow it down?
>
>Depends.
>
>A journalled filesystem can be a lot faster than a syncronous
>non-journaled filesystem and also gives a better level of fault tolerance.
>
>A simple asyncronous non-journalled filesystem is almost faster than a 
>journalled filesystem, but is at the same time very sensitive to errors.
>  
>
---
   Aren't ext2 and fat32, ufs, etc....all pretty much 
async/non-journaled?  Weren't they
(and in many cases, still are) used for decades without being 
"sensitive"?  Yes, a system
crash required an fsck/chkdsk, but if the OS doesn't crash that often, 
is it really
"sensitive". 
    FAT32 and ext2 only mis-behave during system failures (a common 
event pre win2000),
but win2k and xp don't keel over and die unexpected as often and only 
rarely do I have 
uncontrolled shutdowns -- and my linux system? My average system uptime 
over the
past 2 months has been about a week
(installing new HW).  Half of those were crashes -- I use a journaling 
fs (xfs).   Before
that, last wtmp averaged out at a 29 day uptime. 
Bugs happen in journaling fs's too -- all of the files I'd modified in 
the previous day had '0's
written throughout them.  Yep -- somehow the journal replayed all of the 
file transactions
going back about 36 hours with all binary zeros.  The backup job files 
that were created
during the morning were erased (on separate hard disk).  The backup from 
the morning
before was intact.  Never had a 'sensitive' file system do such 
comprehensive 'forgetting'
of every file that had been touched in previous 24 hours.  Log files 
were reset as though
the 24th never happened.  Trippy.  Very rare/fluke...but also, 
unfortunately, a possibility.
>>I recently ran a Sandra filesystem benchmark on FAT32/NTFS and found
>>NTFS was around 3-4x slower than FAT32.  Suprised me since NTFS is
>>supposed to be MS's state-of-the-fart file system, but I wondered if the
>>journaling was slowing it.
>>    
>>
>
>NTFS is a cool filesystem design, but yes, jornaling slows it down a 
>little. It is not really fair to compare NTFS to FAT32 on NT as the FAT32 
>is completely asyncronous with absolutely no fault tolerance.
>  
>
----
    What other windows file system would one compare NTFS to?  BTW,  at 
one point, I thought
I remember fat32 being syncronous on linux.  Theoretically, with no 
support for access rights,
file owner and limited time field accuracy, FAT32 should run faster than 
ext2. 
    But -- for a 'temporary internet cache', how much fault tolerance 
does one need?  I could
see (if  memory was cheap enough) of running squid on a RAM disk.  If 
your server stays up
for a month at a time, I think the effects of losing the cache once a 
month would be negligible
compared to the benefit of zero ms disk access...
>>I wonder...if one stuck a mySQL database on the back end of squid for a
>>FS driver and ran mySQL using one big 'file' that was namd
>>/dev/sdAx...or weirder /dev/raw/sdAx (or whatever the syntax would be).
>>    
>>
>
>Juck.. why would one want to do so?
>  
>
---
    I dunno...the algorithms to store and retrieve data in a database 
might have been given
more research bucks to be optimized for speed than the the squid 
database on top of a
file system delay.  It's a WAG (Wild Ass Guess)...but databases place 
heavy emphasis on
getting high TPS -- something similar to what squid (and apache).  But 
for squid, it's just
retrieval, not a great deal of processing...I think retrieving static 
data from a database should
be something a database would have to have excelllent performance on to 
be successful.
Maybe squid's algorithms on top of a linux filesystem are better than 
just the overhead
of a database lookup on a raw or block-cached device.  It's possible, 
but I would tend to
believe, on the average, the odds would favor the database backend on 
either a raw or block
device (this is knowing nothing about the actual algorithms or based on 
any specifics -- just the
general concepts.  But reality could be quite different from 'theory'.
>>Is it possible it would slow things down if you gave squid too large a 
>>partition, i.e. is it
>>better to have 5% usage of a 10G partition or 50% usage of a 1G 
>>partition?
>>    
>>
>
>On most UNIX type filesystems you should aim for having at least 20% free 
>space for optimal performance.
>
>  
>
---
    Both of the above example meet that criteria. 
>>Maybe it's all so dwarfed by network latency, the local fs doesn't
>>matter (really just 1-2 users of cache)....-- in fact that's probably
>>the case...might as well use a rewritable CD-ROM for all the speed we
>>have here (about 1/100th internal 100Bt ethernet)...
>>    
>>
>
>Right.. for this setup mostly anything will do I think. But the
>rewriteable CD is probably a little too slow (very long setup times for
>writing, and cannot read without first terminating the write) and may
>eventually wear out both the media and mechanics of the drive..
>  
>
What if it is an asyncronous/buffered rewritable CD? :-)   Yeah, might 
have some wear & tear...
but if you set your flush-time sufficiently high, could minimize number 
of writes ...but
yeah,  it's probably not an ideal choice...though if I was on a 56K 
modem, it might be
sufficient.
When's the tech-2nd-world nation US going to start delivering 10Mb/s for 
$20/month....:-)...
I mean, really, "broadband" as it stand now, is pretty darn narrow for 
video distribution --
especially if we are comparing to HDTV standards....the movie industry 
hints that if broadband
was more widely accepted, but who's going to accept a 320x240 view of 
the Matrix in a
tiny window on a 1600x1200 monitor?   Ick!
But I'm rambling....(again)....
>Regards
>Henrik
>
Thanks for the info....though still am wondering if reiser was run 
against ext2 and
an async version of fat32....theoretically fat32 could be useful for tmp 
files in ram disks or
something because of it's low overhead...no?
linda
Received on Sun Oct 26 2003 - 00:41:11 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:20:40 MST