squid suggestions from Oskar Pearson on 1997-03-10 (squid-dev)

From: Oskar Pearson <oskar@dont-contact.us>
Date: Mon, 10 Mar 1997 10:33:04 +0200 (GMT)

Hi people

I recently wrote a perl script that read a word from a text file, opened a
unique file, wrote the text and then closed that filehandle.

I found that it was spending at least 20% of it's time in system mode,
taking forever and being verry ineficient.

I re-wrote the code so I opened all of the filehandles beforhand, and found
that it was spending 99 percent of time in user mode. Much better.

I thought it was just a perl inefficiency, as an strace seemed to show that
it was doing a fstat and a couple of other calls every time it opened and
closed the file, so I decided to do a test:

>cat temp.c
#include <stdio.h>

FILE *outfile;
long int a; /* need the long? I dunno */

main(){
for (a=0;a < 1000000;a++){
        outfile = fopen("tempfile.1","w");
        fclose(outfile);
        }
}
----------------------------
> time ./temp
17.810u 78.250s 1:36.31 99.7% 0+0k 0+0io 11pf+0w
With Solaris 2.5
time ./temp
37.16u 526.60s 17:32.49 53.5%
----------------------------

Sure enough it's spending huge amounts of time opening an closing the files.

I have yet to test this on other operating systems (I run Linux).

I have long been wondering why our caches spend so much time in system mode,
and believe that a lot of it could be caused by opening and closing
sockets and filehandles.

There is not much we can do about sockets, but more on that later.

I think that creating one huge file which contains all the cached pages
could speed up squid substantially.

I created a 30 000 000 byte file (hopefully large enough to avoid caching,
since I only have 16M ram on my laptop...) by pressing "a" 30000000 times in
vim. I then created a little C program that would seek to a 1000000
pseudo-random places in the file.

------------------------------------------
In linux
time ./temp2
7.190u 18.100s 0:25.29 100.0% 0+0k 0+0io 735pf+0w

On Solaris
time ~/temp2
23.32u 395.12s 13:06.88 53.1%
----------------------------

Now - it seems that the machine is spending a higher percentage of time
in system mode, but is getting things done a lot faster.

Here is the code:
---------------------------------
#include <stdlib.h>
#include <stdio.h>
#define RAND_MAX 999999

FILE *outfile;
long int a;
int b;

main(){
outfile = fopen("seektest","w+");
for (a=0;a < 1000000;a++){
        fseek(outfile,rand(),0);
        b=getc(outfile); /* I was worried that it would optimize */
        }
fclose(outfile);
}
---------------------------------

Now I don't really know what to think... Is there something I am missing,
or will a large file be better?

I think that a lot of development would have to go into the "large file"
thing... so the extra x<10% time spent less in system time might not
be worth it.

About the network socket/filehandles being opened:

Sun has (?recently?) moved their NFS3 system away from UDP to a system where
they open a single (or more) TCP socket and then do the transfers through that
full-duplex socket. I was wondering if this wouldn't be more efficient than
UDP based ICP traffic over reliable networks? We load balance between two cache
machines on the same ethernet, and I think that this would be more
efficient? With the current code do we open a UDP socket (dunno if this is the
way UDP works, it is the way TCP does) for each and every ICP query,
or do we use an existing socket to just "send another packet please"?

Oskar
Received on Tue Jul 29 2003 - 13:15:40 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:14 MST